Arnaud Brand wrote:
> Le 08/02/10 23:18, James Carlson a écrit :
>> Causes for RST include:
>>
>>    - peer application is intentionally setting the linger time to zero
>>      and issuing close(2), which results in TCP RST generation.
>>    
> Might be possible, but I can't see why the receiving end would do that.

No idea, but a debugger on that side might be able to detect something.

>>    - bugs in one or both peers (often related to TCP keepalive; key
>>      signature of such a problem is an apparent two-hour time limit).
>>    
> That could be it, but I doubt it since disconnections appeared anywhere
> randomly in the range 10 minutes to 13 hours.
> It should be noted that the node sending the RST keeps the connection
> open (netstat -a shows its still established).
> To be honest that puzzles me.

That sounds horrible.  There's no way a node that still has state for
the connection should be sending RST.  Normal procedure is to generate
RST when you do _not_ have state for the connection or (if you're
intentionally aborting the connection) to discard the state at the same
time you send RST.

That points to either a bug in the peer's TCP/IP implementation or one
of the causes that you've dismissed (particularly either a duplicate IP
address or a firewall/NAT problem).

>> You (at least) have to analyze the packet sequences to determine what is
>> going wrong.  Depending on the nature of the problem, it may also take
>> in-depth kernel debugging on one or both peers to locate the cause.
>>    
> I relaunched another transfer and I'm tcpdumping both servers in the
> hope that I find something.
> In the mean time I've received a beta bios from tyan which provides
> support for IKVM over tagged VLANs.
> Until now the intel chips (on which the IKVM/IPMI card is piggy-backed)
> are working better than before.
> I can't tell if it's related or not, I'm crossing fingers.

That could EASILY be related.  That was key information to include.

IPMI, as I recall, hijacks the node's Ethernet controller to provide
low-level node control service.  From what I remember out of ARC
reviews, the architecture is pretty brutal^W"clever".

I wouldn't be surprised in the least if this is the problem.  I like the
idea of remote management, but that's the sort of thing I'd never enable
on my systems ...

> Regarding kernel debugging I though I would look for dtrace scripts, and
> found some, but nothing that seemed relevant in my case.
> As I a complete beginner (read: copy-paste) in dtrace I couldn't yet
> figure out how to write one myself.

Is the remote end that's generating RST also OpenSolaris?

-- 
James Carlson         42.703N 71.076W         <[email protected]>
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to