Hi Valery, Zhang,

I agree that the protocol makes some assumptions regarding the synchronization between peers, which perhaps we should have better clarified in the RFC.

- The synchronization channel is reliable.
- The standby member can have a reasonably good estimate regarding the the IKE SA Message ID.

I think that for most implementations, these assumptions do hold. Most IKE messages will be acknowledged, so we should not expect a large number of messages to be in-flight; "real" changes in the IKE state (addition/deletion of Child SAs) will be synchronized between the cluster members; periodic liveness tests (if any) have a predictable rate.

In addition to the MUST rule on setting M1 to the highest known value, there is also a "RECOMMENDED" (SHOULD) rule to increment M1 by the window size. If this rule is followed, I think both scenarios listed below will happen very rarely.

Also note that Sec. 8.3 recommends dropping the IKE SA if any inconsistencies are detected in practice.

The goal of the protocol is to minimize the number of dropped SAs during a failover. The protocol does not ensure that absolutely no SAs will ever be dropped.

Thanks,
        Yaron

On 06/22/2012 03:39 PM, Valery Smyslov wrote:
Hi,
I tend to agree with Zhang, that described scenario may take place.
It may happen if, for some reason, newly active member picks M1
lower than the highest value used for request by previously active member.
Sure, bullet 1 in section 5.1 of RFC6311 says that newly active member
MUST choose M1 larger than any value known to have been used,
but the question is - how it can reliably calculate this value?
If synchronization channel between members is reliable and
syncronization messages
are frequent enough (say, for each changes of Message ID counters),
than no problem here. But if synchronization between members is not so
frequent
or channel allows message loss, than newly active member can only
estimate the highest sender's Message ID used by previously active
member. And if its estimation appears to become wrong (less than
actually used), than 2 scenarious are possible:
1. If last requests made by previously active member get delayed
on their fly to peer and arrive there AFTER Message ID syncronization
is finished, than we have Zhang's scenario - for peer they will look
like generated by newly active member and it will respond to them
correspondently, but for newly active member those responses
will be either unexpected (if it doesn't generate any request yet)
or incorrectly processed (if it has already made some requests).
In the latter case situation will become unpredictable.
2. The other problem in this situation is that according to bullet 4
in section 5.1 of RFC6311 peer MUST silently drop any message,
containing M1 less or equal than the highest value it has seen from
the cluster. Again, if newly active member incorrectly estimates
the highest value, used by previously active member and send
smaller value in M1, than peer will silently drop the message,
that will eventually lead to IKE SA deletion.
Regards,
Smyslov Valery.

    ----- Original Message -----
    *From:* Kalyani Garigipati (kagarigi) <mailto:[email protected]>
    *To:* [email protected] <mailto:[email protected]>
    *Cc:* [email protected] <mailto:[email protected]> ;
    [email protected] <mailto:[email protected]>
    *Sent:* 20 июня 2012 г. 15:26
    *Subject:* Re: [IPsec] Updates to the IKEv2 Extension for
    IKEv2/IPsec High Availablity

    Hi Zhang,

    *For Analysis 1 *

    =========================================================

    a0[t].x a0's maximum message ID received from the peer b until time t

    a0[t].y a0's next sender message ID at time t

    I think there is confusion above.

    I guess what u actually wanted to convey is

    a0[t].x a0's maximum message ID of the request sent and orderly
    response received from the peer b until time t , say if a has
    received responses for 2 and 3 and 5 , this value is 3. So now the
    window would be 4-8 (inclusive)

    a0[t].y a0's next request message ID to be sent at time t , this
    value is 6 , if it has already sent the requests 2,3,4,5,

    b[t].x b's maximum message ID of the orderly response sent to
    cluster until time t , if it has sent response to 2,3 and 5 id
    requests , it is still 3.

    b[t].y b's next message id of the request to be received, which is 6

    I guess the example has wrong values.

    if window size is 5

    then

    For a concrete example, let's assume:

    a1[T0].x == a0[T0].x = 9 ---------------how can x and y vary by 10
    when the window size is 5 ?

    a1[T0].y == a0[T0].y = 20

    b[T0].x = 19 -- same is the case here..

    b[TO].y = 10

    Please adjust these values and then there will be no issues as
    mentioned by you.

    also please refer to the section 8.1 and 9 of the RFC 6311 which
    says that when the window moves to the synchronised value,

    all the old pending requests and to-be retransmitted responses
    should be deleted. So the below issue will not happen

    /When the new active member a1 sends request messages with Messages
    IDs 25, 26,27,28,29, since the peer b has processed the request
    message with the same /

    //

    /Message IDs, the peer b will return response messages for the
    request messages from the old active member a0. /

    //

    /Then a1 receives acceptable but mismacthed response messages./

    //

    *For Analysis 2*

    This is the problem of basic HA simulatenous fail-over and not just
    about message ID synchronisation.

    when both of the devices don’t have new sa and just have the old sa.

    Then they will continue with the old sa or bring down the sa based
    on the administrative control.

    the sync time within a cluster among a and b might be same or
    different due to which one of them would have old sa and another
    would have new sa. in such

    cases both the sa would be deleted eventually and new sa is established.

    or even if sync times are different, one would have old sa with
    lesser message id's and other have same old sa with higher message
    id's and then both will

    start from the higher message id values.

    Regards,

    kalyani

    *From:*[email protected] [mailto:[email protected]] *On
    Behalf Of *[email protected]
    *Sent:* Wednesday, June 20, 2012 7:44 AM
    *To:* Kalyani Garigipati (kagarigi)
    *Cc:* [email protected]; [email protected]
    *Subject:* Re: [IPsec] Updates to the IKEv2 Extension for
    IKEv2/IPsec High Availablity


    Hi Kalyani,



    First I'd like to make some clarifications according to your
    comments, and leave other clarifications to further discussions.

    1. Clarification for case C in Section 2.2
    (case C is the most troublesome in this section IMO.So I'd like to
    clarify it.)
    1.1 Notation for case C in Section 2.2

    x: the maximum message ID received from the peer party
    y: the next sender message ID

    a0: the old active cluster member
    a1: the new active cluster member
    b: the peer


    a0[t].x a0's maximum message ID received from the peer b until time t
    a0[t].y a0's next sender message ID at time t

    a1[t].x a1's maximum message ID received from the peer b until time t
    a1[t].y a1's next sender message ID at time t

    b[t].x b's maximum message ID received from the cluster until time t
    b[t].y b's next sender message ID at time t


    T0: At this time point, the last synchronization between a0 and a1
    is carried out

    T1: At this time point, the failover event occurs

    T2: At this time point, the Message ID synchronization between a1
    and b starts


    T3: At this time point, the Message ID synchronization between a1
    and b ends

    SW: the sender window size of the cluster. Let's assume SW is 5.

    
----T0--------T1----------T2----------T3-------------------------------------


    1.2 Analysis for case C in Section 2.2


    We know that:

    a1[T0].x == a0[T0].x
    a1[T0].y == a0[T0].y
    (The reaon is that at T0, the synchronization between a0 and a1 is
    carried out.)

    And


    a1[T1].x == a0[T0].x
    a1[T1].y == a0[T0].y

    And

    a1[T2].x == a0[T0].x
    a1[T2].y == a0[T0].y

    (The reaon is that from T0 to T2, the state data of a1 keeps
    unchanged.)

    According to RFC 6311,

    "M1 is the next sender's Message ID to be used by the member. M1
    MUST be chosen so that it is larger than any value known to have
    been used. It is RECOMMENDED to increment the known value at
    least by the size of the IKE sender window."

    At T2, the new active member a1 can set M1=a1[T2].y + SW.


    And

    "M2 MUST be at least the higher of the received M1, and one more
    than the highest sender value received from the cluster. This
    includes any previous received synchronization messages."

    At T2, the peer b can set M2 = max(M1, 1 + b[T2].x).

    M1==a1[T2].y + SW => M1 == a0[T0].y + SW => M1 == a0[T0].y + 5

    Suppose some message exchanges (i.e., 10 messages) have been carried
    out from T0 to T2, it's possible that b[T2].x + 1 > a0[T0].y + 5.

    Then the peer b sets M2=1 + b[T2].x.

    At T3, when the new active member a1 receives the Message ID
    synchronization response from the peer b, a1 sets a1[T3].y = M2.

    a1[T3].y == M2 => a1[T3].y ==1 + b[T2].x.


    At T2, a0[T2].y could be b[T2].x+5.
    (The reaon is that a0's sent messages with Message IDs b[T2].x+1,
    b[T2].x+2,b[T2].x+3,b[T2].x+4,b[T2].x+5 may NOT have reached to the
    peer b.)

    This means a1[T3].y < a0[T2].y.

    This means the first five messages sent by the new active member a1
    will have Message IDs b[T2].x+1,
    b[T2].x+2,b[T2].x+3,b[T2].x+4,b[T2].x+5.

    Suppose after T3, the peer receives the old active member a0's sent
    messages with Message IDs b[T2].x+1,
    b[T2].x+2,b[T2].x+3,b[T2].x+4,b[T2].x+5, and sends response messages.

    After that, the new active member a1 sends the first five request
    messages with Message IDs b[T2].x+1,
    b[T2].x+2,b[T2].x+3,b[T2].x+4,b[T2].x+5.
    After receving these request messages, the peer b will regards these
    requests as resent messages, and returns response messages for
    requests of a0's sent messages with Message IDs b[T2].x+1,
    b[T2].x+2,b[T2].x+3,b[T2].x+4,b[T2].x+5 to a1.
    (My understanding, according to RFC 5996, is that the peer should
    treat the new active member's request messages as resent reqeusts. )
    re-sent
    As a result, the peer b receives acceptable but mismatched responses
    for its request messages with Message IDs a1[T2].x+1,
    a1[T2].x+2,a1[T2].x+3,a1[T2].x+4,a1[T2].x+5.

    For a concrete example, let's assume:

    a1[T0].x == a0[T0].x = 9
    a1[T0].y == a0[T0].y = 20

    b[T0].x = 19
    b[TO].y = 10

     From T0 to T1, the old active member a0 have sent 10 request
    messages to the peer b, and 5 messages have been received and
    acknowledged by the peer b.

    This means that a0[T2].y = 30, b[T2].x = 24. Note request messages
    with Message ID 25,26,27,28,29 have been sent by the old active
    member a0, but have NOT reached the peer b0. (The sender window size
    of the cluster is 5.)


    According to RFC 6311,

    "M1 is the next sender's Message ID to be used by the member. M1
    MUST be chosen so that it is larger than any value known to have
    been used. It is RECOMMENDED to increment the known value at
    least by the size of the IKE sender window."


    M1 == a1[T2].y + SW == 20 + 5 == 25
    (a1[T2].y == a1[T0].y == 20)

    And
    "M2 MUST be at least the higher of the received M1, and one more
    than the highest sender value received from the cluster. This
    includes any previous received synchronization messages."

    M2 == max{M1, 1 + b[T2].x)== max(25,1+24) == 25

    After the new active member a1 receives M2, a1 sets a1[T2].y == 25 <
    a0[T2].y == 30.

    The Message ID for the new active member a1 numbers from 25.

    The first five Message IDs are 25, 26, 27,28,29.

    When the new active member a1 sends request messages with Messages
    IDs 25, 26,27,28,29, since the peer b has processed the request
    message with the same Message IDs, the peer b will return response
    messages for the request messages from the old active member a0.

    Then a1 receives acceptable but mismacthed response messages.


    2. Clarifications for the simultanesous failover case F in Section 2.3
    For the simultaneous failover, case F is the most devastating IMO.
    So I'd like to clarify it first.
    2.1 Notation for the simultanesous failover case F in Section 2.3


    a0: the old active cluster member of the cluster a
    a1: the new active cluster member of the cluster a
    b0: the old active cluster member of the cluster b
    b1: the new active cluster member of the cluster b


    T0: At this time point, the last synchronization between a0/b0 and
    a1/b1 is carried out,

    T1: At this time point, the simultaneous failover event occurs

    T2: At this time point, the Message ID synchronization between a1
    and b1 starts


    T3: At this time point, the Message ID synchronization between a1
    and b1 ends



    
----T0--------T1----------T2----------T3-------------------------------------


    2.2 Analysis for the simultanesous failover case F in Section 2.3


    It's possible that from T0 to T1, a0 and b0 deletes the old IKE SA
    sa0 and creates a new IKE SA sa1.
    But at T2, a1 and b1 do NOT know what has happened from T0 and T1,
    and do NOT know the existance of the new IKE SA sa1, and use the old
    IKE SA sa0 to carry out Message ID synchronization.
    This may bring some more seriouse problem. So when simultaneous
    failover occurs, a simple two-way synchronization may not be an
    appropriate solution.





    *"Kalyani Garigipati (kagarigi)" <[email protected]
    <mailto:[email protected]>>*
    发件人: [email protected] <mailto:[email protected]>

    2012-06-14 19:14

        

    收件人

        

    "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>,
    "[email protected] <mailto:[email protected]>" <[email protected]
    <mailto:[email protected]>>

    抄送

        

    主题

        

    Re: [IPsec] Updates to the IKEv2 Extension for IKEv2/IPsec High
    Availablity

        




    Hi Zhang,

    Thanks for going through the RFC 6311 .

    I have gone through your proposed draft and felt that there is some
    confusion regarding the message id concept of ikev2.

    I have seen that in section 2.3 you were comparing the higer sender
    value of x2 with y2.
    That is wrong. when x2 proposes the next higher message id to be
    used to send a request ,
    then on y2 you shld tally it with the next higher message id of the
    request to be recieved
    (and not with the next higher message id of the request to be sent)

    in ikev2 the message id of requests to be sent are entirely
    different from message id of requests to be received.
    that is why RFC says a message id is used four times on a given device.

    1. message id X is used while sending a request
    2. message id X is used while receiving the response
    3. message id X is used to receive a request
    4. message id X is used to send a response.


    please find the comments for each section

    Section 2.1: This is a known issue and that is why using RFC 6311,
    we are synchronising the message id's

    Section 2.2: The peer is assumed to be proper anchor point which has
    correct info of message id of requests sent and recieved,
    even when peer is cluster member , among the two devices one of them
    would be less wrong and have higher accurate values of message id's .
    so we pick up that value. I dont see any issue here.

    Section 2.3: First of all there is no relation between M1 and P1.
    on a given device.
    --- M1 is the proposed message id of the request to be sent
    P1 is the proposed message id of the request to be received.

    when simulatenous failover happens, x2 might have higher value of M1
    when compared to y2 , but x2 might have lower value of P1 when
    compared to y2.
    It does'nt matter. both are independent. what we eventually do is
    compare the M1 value across x2 and y2 and pick the higer one.
    same process is repeated for P1.

    case 1 to 6 are already handled by basic ikev2 RFC . like if we
    receive same message id twice , then we retransmit or drop it if it
    is outside the window.


    Section 3: during simultaneous failover both the cluster and the
    peer member would be in unreliable state.
    Both of them are wrong , but one of them is less wrong !!! so we
    want to start from that point to synchronise the message id's.

    so we are allowing both the members to announce their message id's
    and then eventually we would synchronise to the higher number.
    I dont see any flaw here. Please explain with an example.

    By your proposal in case of simultaneous failover, both the cluster
    and peer will be in UNSYNED state and
    both would end up sending and rejecting the synchronisation request.
    This would lead to repeated synchronisation efforts and the problem
    of message synchronisation is never solved.

    so UNSYNED state is not required.

    Section 4:

    I feel that RFC 6311 already solves the message id synchronisation
    issue.
    I dont think we need to increment M1 by double the window size as
    proposed by you.
    Please support your proposal with an example with message id values
    of numbers instead of variables.
    Like M1 is 3 , M2 is 4 etc etc.

    Numbers make it more clear.

    regards,
    kalyani

    *From:*[email protected] <mailto:[email protected]>
    [mailto:[email protected]]
    <mailto:[mailto:[email protected]]> *On Behalf Of
    *[email protected] <mailto:[email protected]>*
    Sent:* Monday, June 11, 2012 7:36 AM*
    To:* [email protected] <mailto:[email protected]>*
    Subject:* [IPsec] Updates to the IKEv2 Extension for IKEv2/IPsec
    High Availablity



    Dear All,

    I've submitted a new draft "Updates to the IKEv2 Extension for
    IKEv2/IPsec High Availablity". This draft analyzes some issues in
    RFC 6311,
    and proposes some updates. Look forward to your comments.


    BR,

    Ruishan Zhang_______________________________________________
    IPsec mailing list
    [email protected] <mailto:[email protected]>
    https://www.ietf.org/mailman/listinfo/ipsec

    ------------------------------------------------------------------------

    _______________________________________________
    IPsec mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/ipsec



_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec
_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to