> On Sept. 23, 2013, 8:01 p.m., Andrew Stitcher wrote:
> > I'm not at all keen on the overall approach here:
> > 
> > It seems to me that in effect you create a new thread (with all its 
> > overhead) for every blocking transaction operation; in that case there 
> > seems to be very little point in getting the IO thread system to do this. 
> > Just create a new thread to do the blocking operation and destroy it at the 
> > end of the operation - that would be precisely equivalent (in terms of 
> > thread creation and destruction) of what is going on in this patch. But 
> > would be clearer and affect fewer sub-systems.
> > 
> > I'm not saying that we might not eventually want an expandable thread pool 
> > with new thread creation triggered by load (or something), but this case 
> > does not seem to be a good reason to do it given the thread creation and 
> > destruction scheduling.
> > 
> > As to preferred approach I'd very much favor making this non-blacking and 
> > asynchronous, as this is what we do (or at least have striven to do) for 
> > every other example like this in the code base - it is the way the broker's 
> > architecture was intended to work.
> > 
> > It would definitely be preferable also to limit the number of threads that 
> > are blocked waiting for transaction operations also, perhaps you could 
> > comment as to why these operations need to be blocking in the first place.

I take your point, and will rethink the approach.

Just to explain: the transaction commit command involves waiting for backup 
brokers to indicate readiness, so we have to delay responding to the commit 
command until that has happened. Hence blocking the IO thread to delay the 
response to the command. Agreed that doing an async response is more in line 
with the architecture but seems complex to enable and specific to 0.10, but I 
will investigate that option more carefully.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14213/#review26317
-----------------------------------------------------------


On Sept. 20, 2013, 6:21 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14213/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2013, 6:21 p.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, Gordon Sim, and Steve 
> Huston.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5139: HA transactions block a thread, can deadlock the broker
> 
> PrimaryTxObserver::prepare blocks pending responses from each backup. With
> concurrent transactions this can deadlock the broker: once all worker threads
> are blocked in prepare, responses from backups cannot be received.
> 
> The solution is as follows:
> - before blocking in prepare, start a new worker thread.
> - after blocking in prepare, stop a worker thread.
> 
> Given N initial broker theads, this ensures that there are always N worker
> threads that are not blocked in tx-prepare.
> 
> An alternative solution would be to make the prepare complete asynchronously. 
>  I
> believe this approach would be more complex, more risky and would be specific 
> to
> the 0-10 protocol.
> 
> TODO: implement for windows and other pollers.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/CMakeLists.txt 1524570 
>   /trunk/qpid/cpp/src/qpid/broker/Broker.h 1524570 
>   /trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1524570 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1524570 
>   /trunk/qpid/cpp/src/qpid/sys/Poller.h 1524570 
>   /trunk/qpid/cpp/src/qpid/sys/PollerThreads.h PRE-CREATION 
>   /trunk/qpid/cpp/src/qpid/sys/PollerThreads.cpp PRE-CREATION 
>   /trunk/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp 1524570 
>   /trunk/qpid/cpp/src/tests/ha_test.py 1524570 
>   /trunk/qpid/cpp/src/tests/ha_tests.py 1524570 
>   /trunk/qpid/cpp/src/tests/test_store.cpp 1524570 
> 
> Diff: https://reviews.apache.org/r/14213/diff/
> 
> 
> Testing
> -------
> 
> New ha_tests.py unit test, starts broker with 2 threads and runs 10 
> concurrent transactions. Fails reliably before the fix is applied. Passed > 
> 300 iteration after the fix.
> 
> Full ctest passes.
> 
> 
> Thanks,
> 
> Alan Conway
> 
>

Reply via email to