Some update on this topic: I ran fio again, this time with Raghavendra's epoll-rearm patch @ https://review.gluster.org/17391
The IOPs increased to ~50K (from 38K). Avg READ latency as seen by the io-stats translator that sits above client-io-threads came down to 963us (from 1666us). ∆ (2,3) is down to 804us. The disk utilization didn't improve. On Sat, Jun 10, 2017 at 12:47 AM, Manoj Pillai <mpil...@redhat.com> wrote: > So comparing the key latency, ∆ (2,3), in the two cases: > > iodepth=1: 171 us > iodepth=8: 1453 us (in the ballpark of 171*8=1368). That's not good! (I > wonder if that relation roughly holds up for other values of iodepth). > > This data doesn't conclusively establish that the problem is in gluster. > You'd see similar results if the network were saturated, like Vijay > suggested. But from what I remember of this test, the throughput here is > far too low for that to be the case. > > -- Manoj > > > On Thu, Jun 8, 2017 at 6:37 PM, Krutika Dhananjay <kdhan...@redhat.com> > wrote: > >> Indeed the latency on the client side dropped with iodepth=1. :) >> I ran the test twice and the results were consistent. >> >> Here are the exact numbers: >> >> *Translator Position* *Avg Latency of READ fop as >> seen by this translator* >> >> 1. parent of client-io-threads 437us >> >> ∆ (1,2) = 69us >> >> 2. parent of protocol/client-0 368us >> >> ∆ (2,3) = 171us >> >> ----------------- end of client stack --------------------- >> ----------------- beginning of brick stack -------------- >> >> 3. child of protocol/server 197us >> >> ∆ (3,4) = 4us >> >> 4. parent of io-threads 193us >> >> ∆ (4,5) = 32us >> >> 5. child-of-io-threads 161us >> >> ∆ (5,6) = 11us >> >> 6. parent of storage/posix 150us >> ... >> ---------------- end of brick stack ------------------------ >> >> Will continue reading code and get back when I find sth concrete. >> >> -Krutika >> >> >> On Thu, Jun 8, 2017 at 12:22 PM, Manoj Pillai <mpil...@redhat.com> wrote: >> >>> Thanks. So I was suggesting a repeat of the test but this time with >>> iodepth=1 in the fio job. If reducing the no. of concurrent requests >>> reduces drastically the high latency you're seeing from the client-side, >>> that would strengthen the hypothesis than serialization/contention among >>> concurrent requests at the n/w layers is the root cause here. >>> >>> -- Manoj >>> >>> >>> On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay <kdhan...@redhat.com> >>> wrote: >>> >>>> Hi, >>>> >>>> This is what my job file contains: >>>> >>>> [global] >>>> ioengine=libaio >>>> #unified_rw_reporting=1 >>>> randrepeat=1 >>>> norandommap=1 >>>> group_reporting >>>> direct=1 >>>> runtime=60 >>>> thread >>>> size=16g >>>> >>>> >>>> [workload] >>>> bs=4k >>>> rw=randread >>>> iodepth=8 >>>> numjobs=1 >>>> file_service_type=random >>>> filename=/perf5/iotest/fio_5 >>>> filename=/perf6/iotest/fio_6 >>>> filename=/perf7/iotest/fio_7 >>>> filename=/perf8/iotest/fio_8 >>>> >>>> I have 3 vms reading from one mount, and each of these vms is running >>>> the above job in parallel. >>>> >>>> -Krutika >>>> >>>> On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai <mpil...@redhat.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhan...@redhat.com >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> As part of identifying performance bottlenecks within gluster stack >>>>>> for VM image store use-case, I loaded io-stats at multiple points on the >>>>>> client and brick stack and ran randrd test using fio from within the >>>>>> hosted >>>>>> vms in parallel. >>>>>> >>>>>> Before I get to the results, a little bit about the configuration ... >>>>>> >>>>>> 3 node cluster; 1x3 plain replicate volume with group virt settings, >>>>>> direct-io. >>>>>> 3 FUSE clients, one per node in the cluster (which implies reads are >>>>>> served from the replica that is local to the client). >>>>>> >>>>>> io-stats was loaded at the following places: >>>>>> On the client stack: Above client-io-threads and above >>>>>> protocol/client-0 (the first child of AFR). >>>>>> On the brick stack: Below protocol/server, above and below io-threads >>>>>> and just above storage/posix. >>>>>> >>>>>> Based on a 60-second run of randrd test and subsequent analysis of >>>>>> the stats dumped by the individual io-stats instances, the following is >>>>>> what I found: >>>>>> >>>>>> *Translator Position* *Avg Latency of READ >>>>>> fop as seen by this translator* >>>>>> >>>>>> 1. parent of client-io-threads 1666us >>>>>> >>>>>> ∆ (1,2) = 50us >>>>>> >>>>>> 2. parent of protocol/client-0 1616us >>>>>> >>>>>> ∆ (2,3) = 1453us >>>>>> >>>>>> ----------------- end of client stack --------------------- >>>>>> ----------------- beginning of brick stack ----------- >>>>>> >>>>>> 3. child of protocol/server 163us >>>>>> >>>>>> ∆ (3,4) = 7us >>>>>> >>>>>> 4. parent of io-threads 156us >>>>>> >>>>>> ∆ (4,5) = 20us >>>>>> >>>>>> 5. child-of-io-threads 136us >>>>>> >>>>>> ∆ (5,6) = 11us >>>>>> >>>>>> 6. parent of storage/posix 125us >>>>>> ... >>>>>> ---------------- end of brick stack ------------------------ >>>>>> >>>>>> So it seems like the biggest bottleneck here is a combination of the >>>>>> network + epoll, rpc layer? >>>>>> I must admit I am no expert with networks, but I'm assuming if the >>>>>> client is reading from the local brick, then >>>>>> even latency contribution from the actual network won't be much, in >>>>>> which case bulk of the latency is coming from epoll, rpc layer, etc at >>>>>> both >>>>>> client and brick end? Please correct me if I'm wrong. >>>>>> >>>>>> I will, of course, do some more runs and confirm if the pattern is >>>>>> consistent. >>>>>> >>>>>> -Krutika >>>>>> >>>>>> >>>>> Really interesting numbers! How many concurrent requests are in flight >>>>> in this test? Could you post the fio job? I'm wondering if/how these >>>>> latency numbers change if you reduce the number of concurrent requests. >>>>> >>>>> -- Manoj >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel