Hi All,

Posting here as it seems the most sensible place, also posting here before 
logging a bug on defect.opensolaris.org because I do not feel I have enough 
information yet to create a useful bug report.

[u]Backstory[/u]
Have used Solaris/OpenSolaris for a good few years for ZFS/SMB Sharing and the 
occasional FC Attached LUN target. When I moved to OpenSolaris 2009.06 (Keeping 
nothing but my userdata ZPOOL in the move, I started to experience drops of 
network connectivity relating to trying to access the SMB shares. A little 
reading later and it seemed 2009.06 had serious SMB issues and the solution 
would be upgrading to the opensolaris dev repo.

Upgraded to the dev repo last night, now running SNV_134, however network 
problems still persist (however I cannot confirm they are exactly THE SAME 
problems as in 2009.06, as the system has been left dormant for quite a while 
as I have had other priorities, I think due to this is it probably best just to 
concentrate on the current issues and not confuse with the past..)

[u]Problem[/u]
Every 10-60 seconds (time varies) the system will stop responding to ping and 
any other network request (such as SMB/SSH etc).
The connectivity will eventually restore itself for a brief while before going 
through the same loop again.

The time spent not responding usually seems to be 4 to 6 times the time spent 
responding. However this is not always the case, sometimes only a couple of 
ICMP pings can be lost before the network comes 'back up'.

The system has an e1000g NIC (82545GM) however the same problem appears when I 
instead use the on-board rge0 NIC (which I usually keep disabled in BIOS). This 
leads me to believe the issue is higher up than an individual NIC driver.

I have done plenty of reading before posting here, however lots of the other 
bugs I can find are either logged with next to no information and make 
comparing my issue to the bug impossible, or seem to relate to a stress 
condition where the NIC only drops under load, such as after a few 10/100GB of 
data transfer.

In my case, network drop/connect/drop cycle starts while still in the 
opensolaris bootup splashscreen and continues through almost any type of load 
on the system.

I have already ruled out:
- Network Cables
- Switch
- NIC in both opensolaris server and client initiating the pings (had a spare 
82545GM)
- Client machine

There is nothing interesting to report in /var/adm/messages.

The only 'success' I have had so far is that pinging FROM the opensolaris box 
(usually a headless server) to anywhere causes any current period of 
connectivity issues to cease 

(inbound pings will start to reply AS SOON as you set an outbound ping going 
with 'ping -s 8.8.4.4' for example)

Also, leaving this constant ping going reduces any future stability issues from 
large periods of unresponsiveness down to two dropped pings every x seconds (I 
have observed X to be random, from 10 second intervals to over two mins).

This is a small enough time that TCP sessions can handle the drop instead of 
timeout, however it is not really a fix.

I should also mention that nwam is disabled in SMF and the e1000g0 is manually 
configured, 'ifconfig -a' looks fine.
I also tried to rule out this still somehow being a SMB related bug such as 
those reported in 2009.06 by doing a 'svcadm disable 
/network/smb/server:default' however the problem remains.

Posting here as I know I need to do more research before logging a useful bug, 
however as more of a Linux than Unix user nowadays I could do with some 
pointers of where to go next.

Thanks in advance, hope the information included here has helped!

//TrXuk
-- 
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to