On Sun, Jan 27, 2019 at 8:03 AM Xavi Hernandez <[email protected]> wrote:
> On Fri, 25 Jan 2019, 08:53 Vijay Bellur <[email protected] wrote: > >> Thank you for the detailed update, Xavi! This looks very interesting. >> >> On Thu, Jan 24, 2019 at 7:50 AM Xavi Hernandez <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I've just updated a patch [1] that implements a new thread pool based on >>> a wait-free queue provided by userspace-rcu library. The patch also >>> includes an auto scaling mechanism that only keeps running the needed >>> amount of threads for the current workload. >>> >>> This new approach has some advantages: >>> >>> - It's provided globally inside libglusterfs instead of inside an >>> xlator >>> >>> This makes it possible that fuse thread and epoll threads transfer the >>> received request to another thread sooner, wating less CPU and reacting >>> sooner to other incoming requests. >>> >>> >>> - Adding jobs to the queue used by the thread pool only requires an >>> atomic operation >>> >>> This makes the producer side of the queue really fast, almost with no >>> delay. >>> >>> >>> - Contention is reduced >>> >>> The producer side has negligible contention thanks to the wait-free >>> enqueue operation based on an atomic access. The consumer side requires a >>> mutex, but the duration is very small and the scaling mechanism makes sure >>> that there are no more threads than needed contending for the mutex. >>> >>> >>> This change disables io-threads, since it replaces part of its >>> functionality. However there are two things that could be needed from >>> io-threads: >>> >>> - Prioritization of fops >>> >>> Currently, io-threads assigns priorities to each fop, so that some fops >>> are handled before than others. >>> >>> >>> - Fair distribution of execution slots between clients >>> >>> Currently, io-threads processes requests from each client in round-robin. >>> >>> >>> These features are not implemented right now. If they are needed, >>> probably the best thing to do would be to keep them inside io-threads, but >>> change its implementation so that it uses the global threads from the >>> thread pool instead of its own threads. >>> >> >> >> These features are indeed useful to have and hence modifying the >> implementation of io-threads to provide this behavior would be welcome. >> >> >> >>> >>> >>> These tests have shown that the limiting factor has been the disk in >>> most cases, so it's hard to tell if the change has really improved things. >>> There is only one clear exception: self-heal on a dispersed volume >>> completes 12.7% faster. The utilization of CPU has also dropped drastically: >>> >>> Old implementation: 12.30 user, 41.78 sys, 43.16 idle, 0.73 wait >>> >>> New implementation: 4.91 user, 5.52 sys, 81.60 idle, 5.91 wait >>> >>> >>> Now I'm running some more tests on NVMe to try to see the effects of the >>> change when disk is not limiting performance. I'll update once I've more >>> data. >>> >>> >> Will look forward to these numbers. >> > > I have identified an issue that limits the number of active threads when > load is high, causing some regressions. I'll fix it and rerun the tests on > Monday. > Once the issue was solved, it caused high load averages for some workloads that were actually causing a regression (too much I/O I guess) instead of improving performance. So I added a configurable maximum amount of threads and made the whole implementation optional, so that it can be safely used when required. I did some tests and I was able to, at least, have the same performance we had before this patch in all cases. In some cases even better. But each test needed a manual configuration on the number of threads. I need to work on a way to automatically compute the maximum so that it can be used easily in any workload (or even combined workloads). I uploaded the latest version of the patch. Xavi > Xavi > > >> >> Regards, >> Vijay >> >
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
