On 2/26/13 5:14 PM, Pedro Ruivo wrote: > So, in this case, the regular message will block until the OOB message is > delivered.
No, the regular message should get delivered as soon as the OOB message has been *received* (not *delivered*). Unless there are previous regular messages from the same sender which are delivered in the same thread, and one of them is blocked in application code... > however, the OOB message is being block in the application > until the regular message is delivered. And there is no way to pick the > regular message from the window list while the OOB is blocked, right? > (assuming no more incoming messages) This actually should happen, as they're delivered by different threads ! > so, if everybody agrees, if I move the OOB message to another thread, > everything should work fine... > > On 02/26/2013 03:50 PM, Bela Ban wrote: >> On 2/26/13 4:15 PM, Dan Berindei wrote: >>> >>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> hi, >>> >>> I found the blocking problem with the state transfer this morning. >>> It happens because of the reordering of a regular and OOB message. >>> >>> Below, is a simplification of what is happening for two nodes >>> >>> A: total order broadcasts rebalance_start >>> >>> B: (incoming thread) delivers rebalance_start >>> B: has no segments to request so the rebalance is done >>> B: sends async request with rebalance_confirm (unicast #x) >>> B: sends the rebalance_start response (unicast #x+1) (the response >>> is a regular message) >>> >>> A: receives rebalance_start response (unicast #x+1) >>> A: in UNICAST2, it detects the message is out-of-order and blocks >>> the response in the sender window (i.e. the message #x is missing) >>> A: receives the rebalance_confirm (unicast #x) >>> A: delivers rebalance_confirm. Infinispan blocks this command >>> until all the rebalance_start responses are received ==> this >>> originates a deadlock! (because the response is blocked in unicast >>> layer) >>> >>> Question: can the request's response message be sent always as >>> OOB? (I think the answer should be no...) >>> >>> >>> We could, if Bela adds the send(Message) method to the Response >>> interface... >> I created a JIRA yesterday: https://issues.jboss.org/browse/JGRP-1602. >> I'm wondering though if you *really* need it, as making all responses >> OOB is a bad idea IMO, see below... >> >> >>> and personally I think it would be better to make all responses OOB >>> (as in JGroups 3.2.x). I don't have any data to back this up, though... >> Intuitively, I think indiscriminatingly marking all responses as OOB >> is bad, especially in the light of the async invocation API which will >> make all messages non-blocking, at least in the OOB or reg thread pools. >> >> The code in 3.3 *does* actually copy the flags of the request into the >> response, so if the request is async (OOB), so will the response be. >> For async RPCs (regular messages), you're not getting any response >> anyway, so no worries here... >> >> >>> My suggestion: when I deliver a rebalance_confirm command (that it >>> is send async), can I move it to a thread in >>> async_thread_pool_executor? >>> >>> >>> I have WIP fix for https://issues.jboss.org/browse/ISPN-2825, which >>> should stop blocking the REBALANCE_CONFIRM commands on the >>> coordinator: https://github.com/danberindei/infinispan/tree/t_2825_m >>> >>> I haven't issued a PR yet because I'm still getting a failure in >>> ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP >>> not receiving an ACK from itself). I'll let you know when I find out... >> >> >> Yes, please do that. I saw in London that you could reproduce it in >> your test, so it should be simple to find the root cause. >> >> >> >>> Weird thing: last night I tried more than 5x time in a row with >>> UNICAST3 and it never blocks. can this meaning a problem with >>> UNICAST3 or I had just lucky? >>> >>> >>> Even though the REBALANCE_CONFIRM command is sent async, the message >>> is still OOB. I think UNICAST/2/3 should not block any regular >>> message waiting for the processing of an OOB message, as long as that >>> message was received, so maybe the problem is in UNICAST2? >> If the OOB thread added the OOB message, then it will simply pass it >> up. However, the regular thread needs to wait for gaps in the receiver >> table to fill, as it doesn't know what type of message will be >> received (could be regular). >> >> As soon as the OOB message has been added to the table, the regular >> message will get delivered >> > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Bela Ban, JGroups lead (http://www.jgroups.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
