Hi Chrissie,
Thanks, I will send a mail to this mailing list about this patch.
On Aug 7, 2014 9:35 PM, "Christine Caulfield" <[email protected]> wrote:

> On 06/08/14 02:09, jason wrote:
>
>> Sorry, the previous patch is wrong. Here is the correction.
>>
>>
> That looks good to me and, I think, the best solution. It seems to be
> decidedly non-trivial to determine if retransmits are present when going
> into hold.
>
> Thanks!
> Chrissie
>
>  On Aug 5, 2014 10:18 PM, "Christine Caulfield" <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Hi Jason,
>>
>>     Thanks for testing that - and the extra info. I'll have another
>>     think then. If I can't come up with anything more we might go with
>>     your patch.
>>
>>     Chrissie
>>
>>     On 05/08/14 13:01, jason wrote:
>>
>>         Hi Christine,
>>         I have tested your patch but it can not solve my problem. By
>> adding
>>         printf, I found that whenever during retransmition occured in my
>>         test
>>         case or not, the retrans_message_queue is always empty. It seems
>>         that
>>         the retrans_message_queue is for recovery state used only?
>>
>>         On Aug 5, 2014 3:50 PM, "Christine Caulfield"
>>         <[email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>
>>              On 01/08/14 10:50, Christine Caulfield wrote:
>>
>>                  On 01/08/14 10:42, Jan Friesse wrote:
>>
>>                      Jason,
>>
>>
>>                          Hi All,
>>
>>                          I have encountered a problem that when there is
>>         no other
>>                          activty on
>>                          ring but
>>                          only retransmition, and token is in hold mode,
>> the
>>                          retransmition will
>>                          become
>>                          slow. More over, if the retransmition is always
>>         fail but
>>                          token
>>
>>
>>                      Yes
>>
>>                          rotation works well,
>>                          then it takes quite a lone
>>         time(fail_to_recv_const *
>>                          token_hold = 2500
>>                          * 180ms = 450sec) for the retransmiting node to
>>         meet the
>>                          "FAILED TO
>>                          RECEIVE" condition to
>>                          re-construct a new ring. This can be reporduced
>>         by the
>>                          following steps:
>>
>>                                1) Create a two-node cluster in udpu
>>         transport mode.
>>                                2) Wait until there is no other activty
>>         on ring.
>>                                3) One, or both nodes delete each other
>>         in nodelist in
>>                          corosync.conf
>>                                4) corosync-cfgtool -R, this can cause a
>>         message
>>                          retransmition,
>>                          but I am
>>                                not sure why.
>>                                5) Since tokenrotation still works well,
>>         but the
>>                          retransmition
>>                          can not be
>>                                satisfied due to node deletion, so, only
>>         "FAILED
>>                          TO RECEIVE"
>>                          condition can form new
>>                                ring. But we need to wait 450 seconds for
>>         it to
>>                          happen. During
>>                          this wait,
>>                                we saw the following logs:
>>
>>
>>                      This is really weird case.
>>
>>                                Jul 30 11:21:06 notice  [TOTEM ]
>>         Retransmit List: e
>>                                Jul 30 11:21:06 notice  [TOTEM ]
>>         Retransmit List: e
>>                                Jul 30 11:21:06 notice  [TOTEM ]
>>         Retransmit List: e
>>                                Jul 30 11:21:06 notice  [TOTEM ]
>>         Retransmit List: e
>>                                Jul 30 11:21:06 notice  [TOTEM ]
>>         Retransmit List: e
>>                                ...
>>
>>
>>                          This problem can be solved by adding
>>                          token_hold_cancel_send() in both
>>                          retransmition request and response conditions in
>>                          orf_token_rtr() to
>>                          speed up
>>                          retransmition. I created a patch below, any
>>         comments?
>>
>>
>>                      Ok. Patch looks fine, but during review I had other
>>         idea.
>>                      What about
>>                      prohibit starting of hold mode where there are
>>         messages to
>>                      retransmit?
>>                      Such solution may be cleaner, isn't it?
>>
>>                      Anyway. This is change in very critical part of the
>>         code, so
>>                      Chrissie,
>>                      can you please take a look to patch and express
>>         your opinion?
>>
>>
>>
>>                  I've been looking it over yesterday. It's a problem I
>> have
>>                  definitely
>>                  seen myself on some VM systems so it's certainly not an
>>         isolated
>>                  case. I
>>                  think Honza is right that there might be a better way
>>         of fixing
>>                  it so
>>                  I'll have a look.
>>
>>                  Chrissie
>>
>>
>>
>>              Annoyingly my common reproducer seems not to be working and
>>         I can't
>>              get yours to make it happen either. If you can still
>>         reproduce it
>>              could you try this patch for me please?
>>
>>              Chrissie
>>
>>
>>              _________________________________________________
>>              discuss mailing list
>>         [email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>
>>         http://lists.corosync.org/__mailman/listinfo/discuss
>>         <http://lists.corosync.org/mailman/listinfo/discuss>
>>
>>
>>
>
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to