Thanks. So I was suggesting a repeat of the test but this time with iodepth=1 in the fio job. If reducing the no. of concurrent requests reduces drastically the high latency you're seeing from the client-side, that would strengthen the hypothesis than serialization/contention among concurrent requests at the n/w layers is the root cause here.
-- Manoj On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay <kdhan...@redhat.com> wrote: > Hi, > > This is what my job file contains: > > [global] > ioengine=libaio > #unified_rw_reporting=1 > randrepeat=1 > norandommap=1 > group_reporting > direct=1 > runtime=60 > thread > size=16g > > > [workload] > bs=4k > rw=randread > iodepth=8 > numjobs=1 > file_service_type=random > filename=/perf5/iotest/fio_5 > filename=/perf6/iotest/fio_6 > filename=/perf7/iotest/fio_7 > filename=/perf8/iotest/fio_8 > > I have 3 vms reading from one mount, and each of these vms is running the > above job in parallel. > > -Krutika > > On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai <mpil...@redhat.com> wrote: > >> >> >> On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhan...@redhat.com> >> wrote: >> >>> Hi, >>> >>> As part of identifying performance bottlenecks within gluster stack for >>> VM image store use-case, I loaded io-stats at multiple points on the client >>> and brick stack and ran randrd test using fio from within the hosted vms in >>> parallel. >>> >>> Before I get to the results, a little bit about the configuration ... >>> >>> 3 node cluster; 1x3 plain replicate volume with group virt settings, >>> direct-io. >>> 3 FUSE clients, one per node in the cluster (which implies reads are >>> served from the replica that is local to the client). >>> >>> io-stats was loaded at the following places: >>> On the client stack: Above client-io-threads and above protocol/client-0 >>> (the first child of AFR). >>> On the brick stack: Below protocol/server, above and below io-threads >>> and just above storage/posix. >>> >>> Based on a 60-second run of randrd test and subsequent analysis of the >>> stats dumped by the individual io-stats instances, the following is what I >>> found: >>> >>> *Translator Position* *Avg Latency of READ fop >>> as seen by this translator* >>> >>> 1. parent of client-io-threads 1666us >>> >>> ∆ (1,2) = 50us >>> >>> 2. parent of protocol/client-0 1616us >>> >>> ∆ (2,3) = 1453us >>> >>> ----------------- end of client stack --------------------- >>> ----------------- beginning of brick stack ----------- >>> >>> 3. child of protocol/server 163us >>> >>> ∆ (3,4) = 7us >>> >>> 4. parent of io-threads 156us >>> >>> ∆ (4,5) = 20us >>> >>> 5. child-of-io-threads 136us >>> >>> ∆ (5,6) = 11us >>> >>> 6. parent of storage/posix 125us >>> ... >>> ---------------- end of brick stack ------------------------ >>> >>> So it seems like the biggest bottleneck here is a combination of the >>> network + epoll, rpc layer? >>> I must admit I am no expert with networks, but I'm assuming if the >>> client is reading from the local brick, then >>> even latency contribution from the actual network won't be much, in >>> which case bulk of the latency is coming from epoll, rpc layer, etc at both >>> client and brick end? Please correct me if I'm wrong. >>> >>> I will, of course, do some more runs and confirm if the pattern is >>> consistent. >>> >>> -Krutika >>> >>> >> Really interesting numbers! How many concurrent requests are in flight in >> this test? Could you post the fio job? I'm wondering if/how these latency >> numbers change if you reduce the number of concurrent requests. >> >> -- Manoj >> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel