Hi Chrissie, Thanks, I will send a mail to this mailing list about this patch. On Aug 7, 2014 9:35 PM, "Christine Caulfield" <[email protected]> wrote:
> On 06/08/14 02:09, jason wrote: > >> Sorry, the previous patch is wrong. Here is the correction. >> >> > That looks good to me and, I think, the best solution. It seems to be > decidedly non-trivial to determine if retransmits are present when going > into hold. > > Thanks! > Chrissie > > On Aug 5, 2014 10:18 PM, "Christine Caulfield" <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Jason, >> >> Thanks for testing that - and the extra info. I'll have another >> think then. If I can't come up with anything more we might go with >> your patch. >> >> Chrissie >> >> On 05/08/14 13:01, jason wrote: >> >> Hi Christine, >> I have tested your patch but it can not solve my problem. By >> adding >> printf, I found that whenever during retransmition occured in my >> test >> case or not, the retrans_message_queue is always empty. It seems >> that >> the retrans_message_queue is for recovery state used only? >> >> On Aug 5, 2014 3:50 PM, "Christine Caulfield" >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >> >> On 01/08/14 10:50, Christine Caulfield wrote: >> >> On 01/08/14 10:42, Jan Friesse wrote: >> >> Jason, >> >> >> Hi All, >> >> I have encountered a problem that when there is >> no other >> activty on >> ring but >> only retransmition, and token is in hold mode, >> the >> retransmition will >> become >> slow. More over, if the retransmition is always >> fail but >> token >> >> >> Yes >> >> rotation works well, >> then it takes quite a lone >> time(fail_to_recv_const * >> token_hold = 2500 >> * 180ms = 450sec) for the retransmiting node to >> meet the >> "FAILED TO >> RECEIVE" condition to >> re-construct a new ring. This can be reporduced >> by the >> following steps: >> >> 1) Create a two-node cluster in udpu >> transport mode. >> 2) Wait until there is no other activty >> on ring. >> 3) One, or both nodes delete each other >> in nodelist in >> corosync.conf >> 4) corosync-cfgtool -R, this can cause a >> message >> retransmition, >> but I am >> not sure why. >> 5) Since tokenrotation still works well, >> but the >> retransmition >> can not be >> satisfied due to node deletion, so, only >> "FAILED >> TO RECEIVE" >> condition can form new >> ring. But we need to wait 450 seconds for >> it to >> happen. During >> this wait, >> we saw the following logs: >> >> >> This is really weird case. >> >> Jul 30 11:21:06 notice [TOTEM ] >> Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] >> Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] >> Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] >> Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] >> Retransmit List: e >> ... >> >> >> This problem can be solved by adding >> token_hold_cancel_send() in both >> retransmition request and response conditions in >> orf_token_rtr() to >> speed up >> retransmition. I created a patch below, any >> comments? >> >> >> Ok. Patch looks fine, but during review I had other >> idea. >> What about >> prohibit starting of hold mode where there are >> messages to >> retransmit? >> Such solution may be cleaner, isn't it? >> >> Anyway. This is change in very critical part of the >> code, so >> Chrissie, >> can you please take a look to patch and express >> your opinion? >> >> >> >> I've been looking it over yesterday. It's a problem I >> have >> definitely >> seen myself on some VM systems so it's certainly not an >> isolated >> case. I >> think Honza is right that there might be a better way >> of fixing >> it so >> I'll have a look. >> >> Chrissie >> >> >> >> Annoyingly my common reproducer seems not to be working and >> I can't >> get yours to make it happen either. If you can still >> reproduce it >> could you try this patch for me please? >> >> Chrissie >> >> >> _________________________________________________ >> discuss mailing list >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> http://lists.corosync.org/__mailman/listinfo/discuss >> <http://lists.corosync.org/mailman/listinfo/discuss> >> >> >> >
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
