On Thu, Jan 31, 2019 at 11:12 PM Xavi Hernandez <xhernan...@redhat.com>

> On Fri, Feb 1, 2019 at 7:54 AM Vijay Bellur <vbel...@redhat.com> wrote:
>> On Thu, Jan 31, 2019 at 10:01 AM Xavi Hernandez <xhernan...@redhat.com>
>> wrote:
>>> Hi,
>>> I've been doing some tests with the global thread pool [1], and I've
>>> observed one important thing:
>>> Since this new thread pool has very low contention (apparently), it
>>> exposes other problems when the number of threads grows. What I've seen is
>>> that some workloads use all available threads on bricks to do I/O, causing
>>> avgload to grow rapidly and saturating the machine (or it seems so), which
>>> really makes everything slower. Reducing the maximum number of threads
>>> improves performance actually. Other workloads, though, do little I/O
>>> (probably most is locking or smallfile operations). In this case limiting
>>> the number of threads to a small value causes a performance reduction. To
>>> increase performance we need more threads.
>>> So this is making me thing that maybe we should implement some sort of
>>> I/O queue with a maximum I/O depth for each brick (or disk if bricks share
>>> same disk). This way we can limit the amount of requests physically
>>> accessing the underlying FS concurrently, without actually limiting the
>>> number of threads that can be doing other things on each brick. I think
>>> this could improve performance.
>> Perhaps we could throttle both aspects - number of I/O requests per disk
>> and the number of threads too?  That way we will have the ability to behave
>> well when there is bursty I/O to the same disk and when there are multiple
>> concurrent requests to different disks. Do you have a reason to not limit
>> the number of threads?
> No, in fact the global thread pool does have a limit for the number of
> threads. I'm not saying to replace the thread limit for I/O depth control,
> I think we need both. I think we need to clearly identify which threads are
> doing I/O and limit them, even if there are more threads available. The
> reason is easy: suppose we have a fixed number of threads. If we have heavy
> load sent in parallel, it's quite possible that all threads get blocked
> doing some I/O. This has two consequences:
>    1. There are no more threads to execute other things, like sending
>    answers to the client, or start processing new incoming requests. So CPU is
>    underutilized.
>    2. Massive parallel access to a FS actually decreases performance
> This means that we can do less work and this work takes more time, which
> is bad.
> If we limit the number of threads that can actually be doing FS I/O, it's
> easy to keep FS responsive and we'll still have more threads to do other
> work.

Got it, thx.

>>> Maybe this approach could also be useful in client side, but I think
>>> it's not so critical there.
>> Agree, rate limiting on the server side would be more appropriate.
> Only thing to consider here is that if we limit rate on servers but
> clients can generate more requests without limit, we may require lots of
> memory to track all ongoing requests. Anyway, I think this is not the most
> important thing now, so if we solve the server-side problem, then we can
> check if this is really needed or not (it could happen that client
> applications limit themselves automatically because they will be waiting
> for answers from server before sending more requests, unless the number of
> application running concurrently is really huge).

We could enable throttling in the rpc layer to handle a client performing
aggressive I/O.  RPC throttling should be able to handle the scenario
described above.

Gluster-devel mailing list

Reply via email to