Dhananjay Joshi wrote:

> Our app is using 1-N style SCTP.
> I am attaching sample code as well as test binary which was used to repeat 
> the problem.
> I am also attaching snoop output and stdout output during the program 
> execution.
> This time our setup had only one IP address on client and server side.
> Test client is on 192.168.1.80 and server is on 192.168.1.83.
> Sever was restarted once during test using control C and once by reboot.
> When server was restarted by control C then client got COM LOST, it attempted 
> connection again and it got COM UP.
> When server system was rebooted then client got COM LOST once and it tried to 
> re-connect.
> There was nothing after this and client program was waiting on 'select' call 
> as output of the program shows.
>>From snoop I can see that ICMP packet was received from server system during 
>>reboot 
> and that seems to have stopped Init attempt from client side.


There seems to be a problem in the peer 192.168.1.83 in
generating an ABORT chunk.  In frame #27 in the trace,
192.168.1.83 sends back an ABORT in response to the INIT
in frame #26 from 192.168.1.80.  But 192..83 does not fill
in the Verification Tag correctly.  Thus 192..80 does not
respond to that and retransmits the INIT in frame #28
after a timeout, which is 3s.

I guess then the server in 192..83 is up again and it takes
the retransmitted INIT.  It responds with INIT-ACK in frame
#29 and the association establishment proceeds normally.

I think 192..83 is rebooted at around frame #74.  192..83
sends an ABORT to 192..80 to terminate the association.  In
frame #75, 192..80 tries to establish another association.
192..83 responds with an invalid ABORT (frame #76) like the
above.

192..80 retransmits the INIT in frame #77 after 3s.  I guess
192..83 is still shutting down and so it responds with another
invalid ABORT (frame #78).  This ABORT is ignored by 192..80
as expected.

192..80 continues the retransmission of INIT in frames #79 -
#82.  So the SCTP stack indeed retransmits the INIT as
expected.  How long did you wait for the event?  Here is the
output I got from running your program.

----

^^^^Wait for some time and make another connection attempt ^^^^
[ 28112006 05:14:06.950 ] Intitiating connection
[ 28112006 05:14:06.950 ] Connection attempt successful errno = 150
[ 28112006 05:14:06.950 ] Connection attempt successful returning value 1
[ 28112006 05:14:06.950 ] Going into select loop
[ 28112006 05:16:39.950 ] Got some thing on socket and going to read it
[ 28112006 05:16:39.950 ] Solaris SCTP: Received event from address
0x4030201 on association id 0
[ 28112006 05:16:39.950 ] postNetStatChange: assoc id 1, state 1 address
0x4030201[ 28112006 05:16:39.950 ] Address 0x4030201 ureachable
[ 28112006 05:16:39.950 ] Going into select loop
[ 28112006 05:16:39.950 ] Got some thing on socket and going to read it
[ 28112006 05:16:39.950 ] Solaris SCTP: Received event from address
0x4030201 on association id 0
[ 28112006 05:16:39.950 ] assoc_change: state=4, error=0, instr=5 outstr=5
[ 28112006 05:16:39.950 ] Received COM Lost/Termination of association 1
state 4 fromAddress 0x4030201
[ 28112006 05:16:39.950 ] Posting COM Lost/Termination of association 1
state 4 fromAddress 0x4030201
^^^^Wait for some time and make another connection attempt ^^^^

----


It takes ~2.5 minutes for the INIT timeout to happen.  I think
if you wait for that long, your program should get the event.


-- 

                                                K. Poon.
                                                [EMAIL PROTECTED]

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to