On Wed, Feb 27, 2013 at 11:13 AM, Bela Ban <[email protected]> wrote: > OK, here's what happens: > > - A's receiver table forB is at #6, this means that next message from B > must be #7 > - A receives B#8 (regular message from B) > - A adds B#8 to B's receiver table, but doesn't deliver it (not OOB,and > not #7) > - A receives OOB message B#7 from B > - The OOB thread delivers B#7 immediately > - Infinispan blocks on B#7 > - Unless another message from B is received, B#8 will *not* get > delivered: as you can see in the codebelow, the OOB thread would check > *after* delivering B#7 if there are more messages to be delivered, but > because it is blocked by Infinispan, it cannot deliver B#8. > > This is one of the rare cases where an OOB thread gets to deliver > regular messages. > > The root cause is that Infinispan blocks on an OOB message; but OOB > messages should never block! This is another reason why an Infinispan > application thread pool makes a lot of sense ! > > I wonder who first added sync mode and locking in JBossCache ;)
> > // An OOB message is passed up immediately. Later, when remove() is > called, we discard it. This affects ordering ! > // http://jira.jboss.com/jira/browse/JGRP-377 > if(msg.isFlagSet(Message.OOB) && added) { > try { > up_prot.up(evt); > } > catch(Throwable t) { > log.error("couldn't deliver OOB message " + msg, t); > } > } > > //The OOB thread never gets here as it is blocked in > up_prot.up()by Infinispan. > > final AtomicBoolean processing=win.getProcessing(); > if(!processing.compareAndSet(false, true)) > return true; > > > > On 2/26/13 7:35 PM, Pedro Ruivo wrote: > > On 02/26/2013 04:31 PM, Bela Ban wrote: > >> On 2/26/13 5:14 PM, Pedro Ruivo wrote: > >>> So, in this case, the regular message will block until the OOB > >>> message is delivered. > >> > >> No, the regular message should get delivered as soon as the OOB message > >> has been *received* (not *delivered*). Unless there are previous regular > >> messages from the same sender which are delivered in the same thread, > >> and one of them is blocked in application code... > > In attachment is part of the log. I only know that the response is > > disappearing between UNICAST2 and the ISPN unmarshaller. > > > > could you please take a look? > > > > the response is being sent and received and I don't understand why > > ISPN is not receive it > > > > Thanks > > Pedro > >> > >> > >>> however, the OOB message is being block in the application > >>> until the regular message is delivered. And there is no way to pick the > >>> regular message from the window list while the OOB is blocked, right? > >>> (assuming no more incoming messages) > >> This actually should happen, as they're delivered by different threads ! > >> > >> > >>> so, if everybody agrees, if I move the OOB message to another thread, > >>> everything should work fine... > >>> > >>> On 02/26/2013 03:50 PM, Bela Ban wrote: > >>>> On 2/26/13 4:15 PM, Dan Berindei wrote: > >>>>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo <[email protected] > >>>>> <mailto:[email protected]>> wrote: > >>>>> > >>>>> hi, > >>>>> > >>>>> I found the blocking problem with the state transfer this > >>>>> morning. > >>>>> It happens because of the reordering of a regular and OOB > >>>>> message. > >>>>> > >>>>> Below, is a simplification of what is happening for two nodes > >>>>> > >>>>> A: total order broadcasts rebalance_start > >>>>> > >>>>> B: (incoming thread) delivers rebalance_start > >>>>> B: has no segments to request so the rebalance is done > >>>>> B: sends async request with rebalance_confirm (unicast #x) > >>>>> B: sends the rebalance_start response (unicast #x+1) (the > >>>>> response > >>>>> is a regular message) > >>>>> > >>>>> A: receives rebalance_start response (unicast #x+1) > >>>>> A: in UNICAST2, it detects the message is out-of-order and > >>>>> blocks > >>>>> the response in the sender window (i.e. the message #x is > >>>>> missing) > >>>>> A: receives the rebalance_confirm (unicast #x) > >>>>> A: delivers rebalance_confirm. Infinispan blocks this command > >>>>> until all the rebalance_start responses are received ==> this > >>>>> originates a deadlock! (because the response is blocked in > >>>>> unicast > >>>>> layer) > >>>>> > >>>>> Question: can the request's response message be sent always as > >>>>> OOB? (I think the answer should be no...) > >>>>> > >>>>> > >>>>> We could, if Bela adds the send(Message) method to the Response > >>>>> interface... > >>>> I created a JIRA yesterday: https://issues.jboss.org/browse/JGRP-1602 > . > >>>> I'm wondering though if you *really* need it, as making all responses > >>>> OOB is a bad idea IMO, see below... > >>>> > >>>> > >>>>> and personally I think it would be better to make all responses OOB > >>>>> (as in JGroups 3.2.x). I don't have any data to back this up, > >>>>> though... > >>>> Intuitively, I think indiscriminatingly marking all responses as OOB > >>>> is bad, especially in the light of the async invocation API which will > >>>> make all messages non-blocking, at least in the OOB or reg thread > >>>> pools. > >>>> > >>>> The code in 3.3 *does* actually copy the flags of the request into the > >>>> response, so if the request is async (OOB), so will the response be. > >>>> For async RPCs (regular messages), you're not getting any response > >>>> anyway, so no worries here... > >>>> > >>>> > >>>>> My suggestion: when I deliver a rebalance_confirm command > >>>>> (that it > >>>>> is send async), can I move it to a thread in > >>>>> async_thread_pool_executor? > >>>>> > >>>>> > >>>>> I have WIP fix for https://issues.jboss.org/browse/ISPN-2825, which > >>>>> should stop blocking the REBALANCE_CONFIRM commands on the > >>>>> coordinator: https://github.com/danberindei/infinispan/tree/t_2825_m > >>>>> > >>>>> I haven't issued a PR yet because I'm still getting a failure in > >>>>> ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP > >>>>> not receiving an ACK from itself). I'll let you know when I find > >>>>> out... > >>>> > >>>> Yes, please do that. I saw in London that you could reproduce it in > >>>> your test, so it should be simple to find the root cause. > >>>> > >>>> > >>>> > >>>>> Weird thing: last night I tried more than 5x time in a row with > >>>>> UNICAST3 and it never blocks. can this meaning a problem with > >>>>> UNICAST3 or I had just lucky? > >>>>> > >>>>> > >>>>> Even though the REBALANCE_CONFIRM command is sent async, the message > >>>>> is still OOB. I think UNICAST/2/3 should not block any regular > >>>>> message waiting for the processing of an OOB message, as long as that > >>>>> message was received, so maybe the problem is in UNICAST2? > >>>> If the OOB thread added the OOB message, then it will simply pass it > >>>> up. However, the regular thread needs to wait for gaps in the receiver > >>>> table to fill, as it doesn't know what type of message will be > >>>> received (could be regular). > >>>> > >>>> As soon as the OOB message has been added to the table, the regular > >>>> message will get delivered > >>>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> [email protected] > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > [email protected] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
