Opensolaris 2009.06, Open Ha-Clusters 2009.06. Using 5 direct interconnects, in 2-node cluster. After a while (from 1 to several days), one of the interconnects faults. If we try to remove that interconnect, then add it again, it still says faulted in status view (clintr status). dladm show-links on one node says up on the given nic, while the other node says down. ifconfig bnxX unplumb, ifconfig bnxX plumb up on this node gives no errors, but the nic stays down. When we do this, the log says copper link up (100mbit). Same procedure on the other node gives log message copper link up (1000mbit), which is the right speed. Trying to force 1000mbit on both nodes with ndd didn't help. The only thing that helps is a full restart of the node with the downed nic... This has happened with 4 different nics on both nodes so far.
I'm not a (open)solaris-expert (far from it), so there is probably something I haven't tried... Hopefully, some of the experts here can help... :o) -- This message posted from opensolaris.org