> That's weird, since the only purpose of the mem-pool was precisely to
> improve performance of allocation of objects that are frequently
Very true, and I've long been an advocate of this approach.
Unfortunately, for this to work our allocator has to be more efficient
than the system's, and it's not - especially wrt locking. Overhead is
high and contention is even higher, heavily outweighing any advantage.
Unless/until we put in the work to make mem-pools perform better at high
thread counts, avoiding them seems like the practical choice.
> * Consider http://review.gluster.org/15036/. With all communications
> going through the same socket, the problem this patch tries to solve
> could become worse.
I'll look into this. Thanks!
> * We should consider the possibility of implementing a global thread
> pool, which would replace io-threads, epoll threads and maybe others.
> Synctasks should also rely on this thread pool. This has the benefit
> of better controlling the total number of threads. Otherwise when we
> have more threads than processor cores, we waste resources
> unnecessarily and we won't get a real gain. Even worse, it could start
> to degrade due to contention.
Also a good idea, though perhaps too hard/complex to tackle in the short
term. I did take a stab at making io-threads use a single global set of
queues instead of per instance, to address a similar concern. To make a
long story short, it didn't seem to make things any better for this
test. I still think it's a good idea, though.
> * There are *too many* mutexes in the code.
> We should drastically reduce its use. Sometimes by using better
> structures that do not require blocking at all or even introducing RCU
> and/or rwlocks. One case that I've always had doubts is dict_t. Why
> does it need locks ? Once xlator should not modify a dict_t once it
> has been passed to another xlator, and if we assume that a dict can
> only be modified by a single xlator at a time, it's very unlikely that
> it needs to modify it from multiple threads.
I think in general you're right about dicts, but I also think it would
be interesting to disable dict locking and see what breaks. I'll bet
there's something *somewhere* that tries to access dicts concurrently.
Callbacks for children of a cluster translator using the "fan out"
pattern seem particularly suspect. What worries me is the classic
problem with race conditions; it's easy to have something that *appears*
to work when things aren't running in parallel enough to hit tiny timing
windows, but it's a lot harder to be *sure* you're safe even when they
do. I think I'd lean toward a more conservative approach of finding the
particularly egregious high-contention cases, examining those particular
code paths carefully, and changing them to use a lock-free dict variant
Gluster-devel mailing list