Hi All, Posting here as it seems the most sensible place, also posting here before logging a bug on defect.opensolaris.org because I do not feel I have enough information yet to create a useful bug report.
[u]Backstory[/u] Have used Solaris/OpenSolaris for a good few years for ZFS/SMB Sharing and the occasional FC Attached LUN target. When I moved to OpenSolaris 2009.06 (Keeping nothing but my userdata ZPOOL in the move, I started to experience drops of network connectivity relating to trying to access the SMB shares. A little reading later and it seemed 2009.06 had serious SMB issues and the solution would be upgrading to the opensolaris dev repo. Upgraded to the dev repo last night, now running SNV_134, however network problems still persist (however I cannot confirm they are exactly THE SAME problems as in 2009.06, as the system has been left dormant for quite a while as I have had other priorities, I think due to this is it probably best just to concentrate on the current issues and not confuse with the past..) [u]Problem[/u] Every 10-60 seconds (time varies) the system will stop responding to ping and any other network request (such as SMB/SSH etc). The connectivity will eventually restore itself for a brief while before going through the same loop again. The time spent not responding usually seems to be 4 to 6 times the time spent responding. However this is not always the case, sometimes only a couple of ICMP pings can be lost before the network comes 'back up'. The system has an e1000g NIC (82545GM) however the same problem appears when I instead use the on-board rge0 NIC (which I usually keep disabled in BIOS). This leads me to believe the issue is higher up than an individual NIC driver. I have done plenty of reading before posting here, however lots of the other bugs I can find are either logged with next to no information and make comparing my issue to the bug impossible, or seem to relate to a stress condition where the NIC only drops under load, such as after a few 10/100GB of data transfer. In my case, network drop/connect/drop cycle starts while still in the opensolaris bootup splashscreen and continues through almost any type of load on the system. I have already ruled out: - Network Cables - Switch - NIC in both opensolaris server and client initiating the pings (had a spare 82545GM) - Client machine There is nothing interesting to report in /var/adm/messages. The only 'success' I have had so far is that pinging FROM the opensolaris box (usually a headless server) to anywhere causes any current period of connectivity issues to cease (inbound pings will start to reply AS SOON as you set an outbound ping going with 'ping -s 8.8.4.4' for example) Also, leaving this constant ping going reduces any future stability issues from large periods of unresponsiveness down to two dropped pings every x seconds (I have observed X to be random, from 10 second intervals to over two mins). This is a small enough time that TCP sessions can handle the drop instead of timeout, however it is not really a fix. I should also mention that nwam is disabled in SMF and the e1000g0 is manually configured, 'ifconfig -a' looks fine. I also tried to rule out this still somehow being a SMB related bug such as those reported in 2009.06 by doing a 'svcadm disable /network/smb/server:default' however the problem remains. Posting here as I know I need to do more research before logging a useful bug, however as more of a Linux than Unix user nowadays I could do with some pointers of where to go next. Thanks in advance, hope the information included here has helped! //TrXuk -- This message posted from opensolaris.org _______________________________________________ networking-discuss mailing list networking-discuss@opensolaris.org