Re: [Gluster-devel] Performance experiments with io-stats translator

Manoj Pillai Wed, 07 Jun 2017 23:52:44 -0700

Thanks. So I was suggesting a repeat of the test but this time with
iodepth=1 in the fio job. If reducing the no. of concurrent requests
 reduces drastically the high latency you're seeing from the client-side,
that would strengthen the hypothesis than serialization/contention among
concurrent requests at the n/w layers is the root cause here.


-- Manoj

On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> Hi,
>
> This is what my job file contains:
>
> [global]
> ioengine=libaio
> #unified_rw_reporting=1
> randrepeat=1
> norandommap=1
> group_reporting
> direct=1
> runtime=60
> thread
> size=16g
>
>
> [workload]
> bs=4k
> rw=randread
> iodepth=8
> numjobs=1
> file_service_type=random
> filename=/perf5/iotest/fio_5
> filename=/perf6/iotest/fio_6
> filename=/perf7/iotest/fio_7
> filename=/perf8/iotest/fio_8
>
> I have 3 vms reading from one mount, and each of these vms is running the
> above job in parallel.
>
> -Krutika
>
> On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai <mpil...@redhat.com> wrote:
>
>>
>>
>> On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhan...@redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> As part of identifying performance bottlenecks within gluster stack for
>>> VM image store use-case, I loaded io-stats at multiple points on the client
>>> and brick stack and ran randrd test using fio from within the hosted vms in
>>> parallel.
>>>
>>> Before I get to the results, a little bit about the configuration ...
>>>
>>> 3 node cluster; 1x3 plain replicate volume with group virt settings,
>>> direct-io.
>>> 3 FUSE clients, one per node in the cluster (which implies reads are
>>> served from the replica that is local to the client).
>>>
>>> io-stats was loaded at the following places:
>>> On the client stack: Above client-io-threads and above protocol/client-0
>>> (the first child of AFR).
>>> On the brick stack: Below protocol/server, above and below io-threads
>>> and just above storage/posix.
>>>
>>> Based on a 60-second run of randrd test and subsequent analysis of the
>>> stats dumped by the individual io-stats instances, the following is what I
>>> found:
>>>
>>> *Translator Position*                       *Avg Latency of READ fop
>>> as seen by this translator*
>>>
>>> 1. parent of client-io-threads                1666us
>>>
>>> ∆ (1,2) = 50us
>>>
>>> 2. parent of protocol/client-0                1616us
>>>
>>> ∆ (2,3) = 1453us
>>>
>>> ----------------- end of client stack ---------------------
>>> ----------------- beginning of brick stack -----------
>>>
>>> 3. child of protocol/server                   163us
>>>
>>> ∆ (3,4) = 7us
>>>
>>> 4. parent of io-threads                        156us
>>>
>>> ∆ (4,5) = 20us
>>>
>>> 5. child-of-io-threads                          136us
>>>
>>> ∆ (5,6) = 11us
>>>
>>> 6. parent of storage/posix                   125us
>>> ...
>>> ---------------- end of brick stack ------------------------
>>>
>>> So it seems like the biggest bottleneck here is a combination of the
>>> network + epoll, rpc layer?
>>> I must admit I am no expert with networks, but I'm assuming if the
>>> client is reading from the local brick, then
>>> even latency contribution from the actual network won't be much, in
>>> which case bulk of the latency is coming from epoll, rpc layer, etc at both
>>> client and brick end? Please correct me if I'm wrong.
>>>
>>> I will, of course, do some more runs and confirm if the pattern is
>>> consistent.
>>>
>>> -Krutika
>>>
>>>
>> Really interesting numbers! How many concurrent requests are in flight in
>> this test? Could you post the fio job? I'm wondering if/how these latency
>> numbers change if you reduce the number of concurrent requests.
>>
>> -- Manoj
>>
>>
>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Performance experiments with io-stats translator

Reply via email to