Hi folks,
My situation is the following : 2 computers (A and B) running
Opensolaris b131 having intel 82574L NICs, connected through an HP4208
switch.
Both computers are on the same network.
I have transfers running from computer A to computer B, either through
ssh or netcat.
As long a computer B is not too busy, the transfer goes like a charm.
But when B's really busy (doing zfs recv from a local file in this case)
, the transfer fails is an odd way after some time (tests show somewhere
between 10 minutes and 13 hours).
What's odd is that A reports that he could not read from B and closes
the connection (no sign of it in netstat), but B still thinks the
connection is open.
Further, running "kstat -p | grep e1000g | grep -i err" on A show all
zeroes but for the following :
e1000g:1:statistics:Recv_Length_Errors 14
link:0:e1000g1:ierrors 14
e1000g:1:mac:ierrors 14
More details on the test cases is available there :
http://opensolaris.org/jive/thread.jspa?threadID=122977&tstart=0
You can see that Brent Jones mentionned the following CR but this is
marked as a dupplicate of something fixed in 131.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510
I did not do any twiddling in e1000g.conf.
Both e1000g are grouped in a aggregation named trk0.
Per advice of Richard Elling, I disabled LACP and, just to be sure, I
unplugged one network cable on each machine.
If any of you has any clue or workaround to try, please share.
Thanks,
Arnaud
_______________________________________________
networking-discuss mailing list
[email protected]