Interesting note! The 7024 is our large switch where all the hosts are connected, but I was told that we were sold the 7000D because the 7024 didn't have a subnet manager. Unfortunately the 7000D has a different CLI and that command is not available and I don't have the password for our 7024 so I can't log onto it. On another note I just noticed the uptime on the 7000D is just over 1 day so that must have been the start of the problem, but I have no idea why it rebooted nor why it didn't come up working. I'm pretty sure we tested a reboot of the device during acceptance testing.
Oh, I just got your second note: ================================== BTW, I highly recommend running the opensm on a server instead of using the sm on the switch. We found running the sm on the switch was much less reliable. I also recommend using a server dedicated to opensm only. ================================== I will take that into consideration, but we bought this as a "turn-key" solution from Dell. They designed it and we had no experience with IB so we trusted their knowledge. Thanks, Mike On Mar 24, 2010, at 11:12 AM, Meyer, Donald J wrote: > http://www.cisco.com/en/US/docs/server_nw_virtual/7024/release_4.1/hardware/installation/guide/7024hig.pdf > > smControl > Starts and stops the embedded subnet manager. > Syntax: > smControl start | stop | restart | status > > Thanks, > Don Meyer > Senior Network/System Engineer/Programmer > US+ (253) 371-9532 iNet 8-371-9532 > *Other names and brands may be claimed as the property of others > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Robbert > Sent: Wednesday, March 24, 2010 10:00 AM > To: Ira Weiny > Cc: [email protected] > Subject: Re: ibstat stuck in state initialized after reboot > > Ira, > Thanks for the quick response. That is what I was afraid of. I've been > looking through the switch documentation, but it doesn't cover starting, > stopping, or even checking the status of the SM service. I'll look into > opening a TAC case, but since Cisco has gotten out of the IB business I'm not > looking forward to seeing what kind of product support they still have. I can > tell you a little more about our topology since it is pretty simple. All of > our hosts are connected to the single large SFS switch, then the 7000D which > is our subnet-manager is only plugged into that larger switch. > > Thanks for the help and wish me luck with support! > > Mike > > On Mar 24, 2010, at 10:38 AM, Ira Weiny wrote: > >> On Wed, 24 Mar 2010 10:26:02 -0600 >> Michael Robbert <[email protected]> wrote: >> >>> I hope this is the correct place to get help with the problem I have. I have >>> an IB fabric running on a Cisco SFS switch with a 7000D as the subnet >>> manager and the whole thing has been running great for well over a year now, >>> but today I noticed that after any node gets rebooted its IB link doesn't >>> initialize. This has happened on 4 hosts now. What I see is as follows: >>> >>> [r...@compute-2-7 ~]# ibstat >>> CA 'mthca0' >>> CA type: MT25204 >>> Number of ports: 1 >>> Firmware version: 1.2.917 >>> Hardware version: 20 >>> Node GUID: 0x0005ad00000c0990 >>> System image GUID: 0x0005ad000100d050 >>> Port 1: >>> State: Initializing >>> Physical state: LinkUp >>> Rate: 20 >>> Base lid: 0 >>> LMC: 0 >>> SM lid: 0 >>> Capability mask: 0x02510a68 >>> Port GUID: 0x0005ad00000c0991 >>> >>> I don't know much about subnet managers, since ours is in hardware and we've >>> never had to configure anything on it, but I can login to the device and it >>> isn't showing any errors. On a node that hasn't been rebooted recently and >>> is still working I can see what appears to be a working subnet manager: >>> >>> [r...@compute-2-10 ~]# sminfo >>> sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 >>> priority 10 state 3 SMINFO_MASTER >>> >>> The same command on a non-working node shows this: >>> >>> [r...@compute-2-7 ~]# sminfo >>> sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 >>> SMINFO_STANDBY >>> >>> So far I have reseated all the cables involved on both ends and I have moved >>> the cables on the switch end to new ports and none of that has made a >>> difference even after reboots. I am hoping to find a node that I can take >>> offline tomorrow so I can actually test the cables, but since this seems to >>> be happening to any host that reboots it doesn't appear to be a cabling >>> problem. Can anybody suggest where I should go from here? Is there anything >>> I can do from a working or non-working host to diagnose the problem? Should >>> I try rebooting the subnet manager switch? Will that affect the rest of the >>> fabric? >> >> Have you spoken to Cisco about the problem? You say you can log into the >> "device" (the SM switch?) if so talk to Cisco about how you may be able to >> restart the SM there. >> >> It does sound like the SM on the switch is failing to transition the links. >> If you can restart the SM on the switch I would try that first. Otherwise >> yes >> rebooting the switch is probably your best bet, and yes it will affect the >> fabric, although I can't say how much without knowing the topology. >> >> Ira >> >>> >>> Thanks, >>> Mike Robbert >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to [email protected] >>> More majordomo info at http://*vger.kernel.org/majordomo-info.html >>> >> >> >> -- >> Ira Weiny >> Math Programmer/Computer Scientist >> Lawrence Livermore National Lab >> 925-423-8008 >> [email protected] > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
