> On 2012-03-07 21:30:28, Andrew Stitcher wrote: > > This is great work. > > > > The most important thing on my mind though is how do ensure that work like > > this maintains correctness? > > > > Do we have some rules/guidelines for the use of the Queue structs that we > > can use to inspect the code to satisfy its correctness? A simple rule like > > this would be "take the blah lock before capturing or changing the foo > > state". It looks like we might have had rules like that, but these changes > > are so broad it's difficult for me to inspect the code and reconstruct what > > rules now apply (if any). > > > > On another point - I see you've added the use of a RWLock - it's always > > good to benchmark with a plain mutex before using one of these as they are > > generally higher overhead and unless the use is "read lock mostly write > > lock seldom" plain mutexes can win. > > Kenneth Giusti wrote: > Re: locking rules - good point Andrew. Most (all?) of these changes I > made simply by manually reviewing the existing code's locking behavior, and > trying to identify what can be safely moved outside the locks. I'll update > this patch with comments that explicitly call out those data members & > actions that must hold the lock. > > The RWLock - hmm.. that may be an unintentional change: at first I had a > separate lock for the queue observer container which was rarely written too. > I abandoned that change and put the queue observers under the queue lock > since otherwise the order of events (as seen by the observers) could be > re-arranged. > > Kenneth Giusti wrote: > Doh - sorry, replied with way too low Caffiene/Blood ratio. You're > referring to the lock in the listener - yes. Gordon points out that this may > be unsafe, so I need to think about it a bit more. > > Gordon Sim wrote: > The order of events as seen by observers can be re-arranged with the > patch as it stands (at least those for acquires and dequeues, enqueues would > still be ordered correctly wrt each other). > > Regarding the listeners access, I think you could fix it by e.g. having > an atomic counter that was incremented after pushing the message but before > populating notification set, checking that count before trying to acquire the > message on the consume side and then rechecking it before adding the consumer > to listeners if there was no message. If the number changes between the last > two checks you would retry the acquisition.
Re: listeners - makes sense. I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with). Re: ordering of events as seen by observers. I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering). Gordon - can you explain the acquire/dequeues reordering case? Does this patch introduce that reordering, or has that been possible all along? - Kenneth ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4232/#review5691 ----------------------------------------------------------- On 2012-03-07 19:23:36, Kenneth Giusti wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4232/ > ----------------------------------------------------------- > > (Updated 2012-03-07 19:23:36) > > > Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted > Ross. > > > Summary > ------- > > In order to improve our parallelism across threads, this patch reduces the > scope over which the Queue's message lock is held. > > I've still got more testing to do, but the simple test cases below show > pretty good performance improvements on my multi-core box. I'd like to get > some early feedback, as there are a lot of little changes to the queue > locking that will need vetting. > > As far as performance is concerned - I've run the following tests on an 8 > core Xeon X5550 2.67Ghz box. I used qpid-send and qpid-receive to generate > traffic. All traffic generated in parallel. > All numbers are in messages/second - sorted by best overall throughput. > > Test 1: Funnel Test (shared queue, 2 producers, 1 consumer) > > test setup and execute info: > qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 > qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 > --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0 > qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 > --ack-frequency 1000 --print-content no --report-total & > qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > > trunk: patched: > tp(m/s) tp(m/s) > > S1 :: S2 :: R1 S1 :: S2 :: R1 > 73144 :: 72524 :: 112914 91120 :: 83278 :: 133730 > 70299 :: 68878 :: 110905 91079 :: 87418 :: 133649 > 70002 :: 69882 :: 110241 83957 :: 83600 :: 131617 > 70523 :: 68681 :: 108372 83911 :: 82845 :: 128513 > 68135 :: 68022 :: 107949 85999 :: 81863 :: 125575 > > > Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers) > > test setup and execute info: > qpidd --auth no -p 8888 --no-data-dir --worker-threads=4 > qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 > --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0 > qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 > --ack-frequency 1000 --print-content no --report-total & > qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 > --ack-frequency 1000 --print-content no --report-total & > qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > > trunk: patched: > tp(m/s) tp(m/s) > > S1 :: S2 :: R1 :: R2 S1 :: S2 :: R1 :: R2 > 52826 :: 52364 :: 52386 :: 52275 64826 :: 64780 :: 64811 :: 64638 > 51457 :: 51269 :: 51399 :: 51213 64630 :: 64157 :: 64562 :: 64157 > 50557 :: 50432 :: 50468 :: 50366 64023 :: 63832 :: 64092 :: 63748 > > **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test. > With the patch, qpidd's cpu utilization was steady at 300% > > > Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 > consumer downstream) > > test setup and execute info: > qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d > qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d > qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000 > --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0 > qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000 > --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0 > qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 > --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0 > qpid-config -b $DST_HOST add exchange fanout dstX1 > qpid-config -b $DST_HOST bind dstX1 dstQ1 > qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1 > qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2 > qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 > --ack-frequency 1000 --print-content no --report-total & > qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 > --report-total --sequence no --timestamp no & > > > trunk: patched: > tp(m/s) tp(m/s) > > S1 :: S2 :: R1 S1 :: S2 :: R1 > 86150 :: 84146 :: 87665 90877 :: 88534 :: 108418 > 89188 :: 88319 :: 82469 87014 :: 86753 :: 107386 > 88221 :: 85298 :: 82455 89790 :: 88573 :: 104119 > > Still TBD: > - fix message group errors > - verify/fix cluster > > > This addresses bug qpid-3890. > https://issues.apache.org/jira/browse/qpid-3890 > > > Diffs > ----- > > /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 > /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 > /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 > > Diff: https://reviews.apache.org/r/4232/diff > > > Testing > ------- > > unit test (non-cluster) > performance tests as described above. > > > Thanks, > > Kenneth > >
