Sorry, the previous patch is wrong. Here is the correction. On Aug 5, 2014 10:18 PM, "Christine Caulfield" <[email protected]> wrote:
> Hi Jason, > > Thanks for testing that - and the extra info. I'll have another think > then. If I can't come up with anything more we might go with your patch. > > Chrissie > > On 05/08/14 13:01, jason wrote: > >> Hi Christine, >> I have tested your patch but it can not solve my problem. By adding >> printf, I found that whenever during retransmition occured in my test >> case or not, the retrans_message_queue is always empty. It seems that >> the retrans_message_queue is for recovery state used only? >> >> On Aug 5, 2014 3:50 PM, "Christine Caulfield" <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 01/08/14 10:50, Christine Caulfield wrote: >> >> On 01/08/14 10:42, Jan Friesse wrote: >> >> Jason, >> >> >> Hi All, >> >> I have encountered a problem that when there is no other >> activty on >> ring but >> only retransmition, and token is in hold mode, the >> retransmition will >> become >> slow. More over, if the retransmition is always fail but >> token >> >> >> Yes >> >> rotation works well, >> then it takes quite a lone time(fail_to_recv_const * >> token_hold = 2500 >> * 180ms = 450sec) for the retransmiting node to meet the >> "FAILED TO >> RECEIVE" condition to >> re-construct a new ring. This can be reporduced by the >> following steps: >> >> 1) Create a two-node cluster in udpu transport mode. >> 2) Wait until there is no other activty on ring. >> 3) One, or both nodes delete each other in nodelist >> in >> corosync.conf >> 4) corosync-cfgtool -R, this can cause a message >> retransmition, >> but I am >> not sure why. >> 5) Since tokenrotation still works well, but the >> retransmition >> can not be >> satisfied due to node deletion, so, only "FAILED >> TO RECEIVE" >> condition can form new >> ring. But we need to wait 450 seconds for it to >> happen. During >> this wait, >> we saw the following logs: >> >> >> This is really weird case. >> >> Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e >> Jul 30 11:21:06 notice [TOTEM ] Retransmit List: e >> ... >> >> >> This problem can be solved by adding >> token_hold_cancel_send() in both >> retransmition request and response conditions in >> orf_token_rtr() to >> speed up >> retransmition. I created a patch below, any comments? >> >> >> Ok. Patch looks fine, but during review I had other idea. >> What about >> prohibit starting of hold mode where there are messages to >> retransmit? >> Such solution may be cleaner, isn't it? >> >> Anyway. This is change in very critical part of the code, so >> Chrissie, >> can you please take a look to patch and express your opinion? >> >> >> >> I've been looking it over yesterday. It's a problem I have >> definitely >> seen myself on some VM systems so it's certainly not an isolated >> case. I >> think Honza is right that there might be a better way of fixing >> it so >> I'll have a look. >> >> Chrissie >> >> >> >> Annoyingly my common reproducer seems not to be working and I can't >> get yours to make it happen either. If you can still reproduce it >> could you try this patch for me please? >> >> Chrissie >> >> >> _______________________________________________ >> discuss mailing list >> [email protected] <mailto:[email protected]> >> http://lists.corosync.org/mailman/listinfo/discuss >> >> >
From dc2d0c2bc75492cada193909c2cd66fba4367d67 Mon Sep 17 00:00:00 2001 From: Jason HU <[email protected]> Date: Wed, 6 Aug 2014 00:10:56 +0800 Subject: [PATCH] [totemsrp] Cancel token holding while in retransmition --- exec/totemsrp.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/exec/totemsrp.c b/exec/totemsrp.c index dcda8d1..b603ef5 100644 --- a/exec/totemsrp.c +++ b/exec/totemsrp.c @@ -3650,6 +3650,12 @@ static int message_handler_orf_token ( transmits_allowed = fcc_calculate (instance, token); mcasted_retransmit = orf_token_rtr (instance, token, &transmits_allowed); + if (instance->my_token_held == 1 && + (token->rtr_list_entries > 0 || mcasted_retransmit > 0)) { + instance->my_token_held = 0; + forward_token = 1; + } + fcc_rtr_limit (instance, token, &transmits_allowed); mcasted_regular = orf_token_mcast (instance, token, transmits_allowed); /* -- 1.9.4.msysgit.0
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
