Re: Review Request: WIP: Improve multi-thread performance by reducing the duration the Queue message lock is held.

Kenneth Giusti Thu, 08 Mar 2012 06:53:16 -0800


> On 2012-03-07 21:30:28, Andrew Stitcher wrote:
> > This is great work.
> > 
> > The most important thing on my mind though is how do ensure that work like 
> > this maintains correctness?
> > 
> > Do we have some rules/guidelines for the use of the Queue structs that we 
> > can use to inspect the code to satisfy its correctness? A simple rule like 
> > this would be "take the blah lock before capturing or changing the foo 
> > state". It looks like we might have had rules like that, but these changes 
> > are so broad it's difficult for me to inspect the code and reconstruct what 
> > rules now apply (if any).
> > 
> > On another point - I see you've added the use of a RWLock - it's always 
> > good to benchmark with a plain mutex before using one of these as they are 
> > generally higher overhead and unless the use is "read lock mostly write 
> > lock seldom" plain mutexes can win.
> 
> Kenneth Giusti wrote:
>     Re: locking rules - good point Andrew.  Most (all?) of these changes I 
> made simply by manually reviewing the existing code's locking behavior, and 
> trying to identify what can be safely moved outside the locks.  I'll update 
> this patch with comments that explicitly call out those data members & 
> actions that must hold the lock.
>     
>     The RWLock - hmm.. that may be an unintentional change: at first I had a 
> separate lock for the queue observer container which was rarely written too.  
> I abandoned that change and put the queue observers under the queue lock 
> since otherwise the order of events (as seen by the observers) could be 
> re-arranged.
> 
> Kenneth Giusti wrote:
>     Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're 
> referring to the lock in the listener - yes.  Gordon points out that this may 
> be unsafe, so I need to think about it a bit more.
> 
> Gordon Sim wrote:
>     The order of events as seen by observers can be re-arranged with the 
> patch as it stands (at least those for acquires and dequeues, enqueues would 
> still be ordered correctly wrt each other).
>     
>     Regarding the listeners access, I think you could fix it by e.g. having 
> an atomic counter that was incremented after pushing the message but before 
> populating notification set, checking that count before trying to acquire the 
> message on the consume side and then rechecking it before adding the consumer 
> to listeners if there was no message. If the number changes between the last 
> two checks you would retry the acquisition.


Re: listeners - makes sense.  I thought the consistency of the listener could 
be decoupled from the queue, but clearly that doesn't make sense (and probably 
buys us very little to begin with).

Re: ordering of events as seen by observers.  I want to guarantee that the 
order of events are preserved WRT the observers - otherwise stuff will break as 
it stands (like flow control, which relies on correct ordering).  Gordon - can 
you explain the acquire/dequeues reordering case?  Does this patch introduce 
that reordering, or has that been possible all along?


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4232/
> -----------------------------------------------------------
> 
> (Updated 2012-03-07 19:23:36)
> 
> 
> Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted 
> Ross.
> 
> 
> Summary
> -------
> 
> In order to improve our parallelism across threads, this patch reduces the 
> scope over which the Queue's message lock is held.
> 
> I've still got more testing to do, but the simple test cases below show 
> pretty good performance improvements on my multi-core box.  I'd like to get 
> some early feedback, as there are a lot of little changes to the queue 
> locking that will need vetting.
> 
> As far as performance is concerned - I've run the following tests on an 8 
> core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate 
> traffic.  All traffic generated in parallel.
> All numbers are in messages/second - sorted by best overall throughput.
> 
> Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
> 
> test setup and execute info:
> qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
> qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 
> --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
> qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 
> --ack-frequency 1000 --print-content no --report-total &
> qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> 
> trunk:                        patched:
> tp(m/s)                       tp(m/s)
> 
> S1    :: S2    :: R1          S1    :: S2    :: R1
> 73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
> 70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
> 70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
> 70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
> 68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
> 
> 
> Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
> 
> test setup and execute info:
> qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
> qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 
> --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
> qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 
> --ack-frequency 1000 --print-content no --report-total &
> qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 
> --ack-frequency 1000 --print-content no --report-total &
> qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> 
> trunk:                               patched:
> tp(m/s)                              tp(m/s)
> 
> S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
> 52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
> 51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
> 50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
> 
> **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  
> With the patch, qpidd's cpu utilization was steady at 300%
> 
> 
> Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 
> consumer downstream)
> 
> test setup and execute info:
> qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
> qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
> qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  
> --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
> qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  
> --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
> qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 
> --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
> qpid-config -b $DST_HOST add exchange fanout dstX1
> qpid-config -b $DST_HOST bind dstX1 dstQ1
> qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
> qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
> qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 
> --ack-frequency 1000 --print-content no --report-total &
> qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 
> --report-total --sequence no --timestamp no &
> 
> 
> trunk:                        patched:
> tp(m/s)                       tp(m/s)
> 
> S1    :: S2    :: R1          S1    :: S2    :: R1
> 86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
> 89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
> 88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
> 
> Still TBD:
>   - fix message group errors
>   - verify/fix cluster
> 
> 
> This addresses bug qpid-3890.
>     https://issues.apache.org/jira/browse/qpid-3890
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
>   /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
>   /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
> 
> Diff: https://reviews.apache.org/r/4232/diff
> 
> 
> Testing
> -------
> 
> unit test (non-cluster)
> performance tests as described above.
> 
> 
> Thanks,
> 
> Kenneth
> 
>

Re: Review Request: WIP: Improve multi-thread performance by reducing the duration the Queue message lock is held.

Reply via email to