Interesting note! The 7024 is our large switch where all the hosts are 
connected, but I was told that we were sold the 7000D because the 7024 didn't 
have a subnet manager. Unfortunately the 7000D has a different CLI and that 
command is not available and I don't have the password for our 7024 so I can't 
log onto it. 
On another note I just noticed the uptime on the 7000D is just over 1 day so 
that must have been the start of the problem, but I have no idea why it 
rebooted nor why it didn't come up working. I'm pretty sure we tested a reboot 
of the device during acceptance testing.

Oh, I just got your second note:
==================================
BTW, I highly recommend running the opensm on a server instead of using the sm 
on the switch.  We found running the sm on the switch was much less reliable.  
I also recommend using a server dedicated to opensm only.
==================================

I will take that into consideration, but we bought this as a "turn-key" 
solution from Dell. They designed it and we had no experience with IB so we 
trusted their knowledge. 

Thanks,
Mike


On Mar 24, 2010, at 11:12 AM, Meyer, Donald J wrote:

> http://www.cisco.com/en/US/docs/server_nw_virtual/7024/release_4.1/hardware/installation/guide/7024hig.pdf
> 
> smControl
> Starts and stops the embedded subnet manager.
> Syntax:
> smControl start | stop | restart | status
> 
> Thanks,
> Don Meyer
> Senior Network/System Engineer/Programmer
> US+ (253) 371-9532 iNet 8-371-9532
> *Other names and brands may be claimed as the property of others
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael Robbert
> Sent: Wednesday, March 24, 2010 10:00 AM
> To: Ira Weiny
> Cc: [email protected]
> Subject: Re: ibstat stuck in state initialized after reboot
> 
> Ira,
> Thanks for the quick response. That is what I was afraid of. I've been 
> looking through the switch documentation, but it doesn't cover starting, 
> stopping, or even checking the status of the SM service. I'll look into 
> opening a TAC case, but since Cisco has gotten out of the IB business I'm not 
> looking forward to seeing what kind of product support they still have. I can 
> tell you a little more about our topology since it is pretty simple. All of 
> our hosts are connected to the single large SFS switch, then the 7000D which 
> is our subnet-manager is only plugged into that larger switch. 
> 
> Thanks for the help and wish me luck with support!
> 
> Mike
> 
> On Mar 24, 2010, at 10:38 AM, Ira Weiny wrote:
> 
>> On Wed, 24 Mar 2010 10:26:02 -0600
>> Michael Robbert <[email protected]> wrote:
>> 
>>> I hope this is the correct place to get help with the problem I have. I have
>>> an IB fabric running on a Cisco SFS switch with a 7000D as the subnet
>>> manager and the whole thing has been running great for well over a year now,
>>> but today I noticed that after any node gets rebooted its IB link doesn't
>>> initialize. This has happened on 4 hosts now. What I see is as follows:
>>> 
>>> [r...@compute-2-7 ~]# ibstat
>>> CA 'mthca0'
>>>      CA type: MT25204
>>>      Number of ports: 1
>>>      Firmware version: 1.2.917
>>>      Hardware version: 20
>>>      Node GUID: 0x0005ad00000c0990
>>>      System image GUID: 0x0005ad000100d050
>>>      Port 1:
>>>              State: Initializing
>>>              Physical state: LinkUp
>>>              Rate: 20
>>>              Base lid: 0
>>>              LMC: 0
>>>              SM lid: 0
>>>              Capability mask: 0x02510a68
>>>              Port GUID: 0x0005ad00000c0991
>>> 
>>> I don't know much about subnet managers, since ours is in hardware and we've
>>> never had to configure anything on it, but I can login to the device and it
>>> isn't showing any errors. On a node that hasn't been rebooted recently and
>>> is still working I can see what appears to be a working subnet manager:
>>> 
>>> [r...@compute-2-10 ~]# sminfo 
>>> sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 
>>> priority 10 state 3 SMINFO_MASTER
>>> 
>>> The same command on a non-working node shows this:
>>> 
>>> [r...@compute-2-7 ~]# sminfo 
>>> sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 
>>> SMINFO_STANDBY
>>> 
>>> So far I have reseated all the cables involved on both ends and I have moved
>>> the cables on the switch end to new ports and none of that has made a
>>> difference even after reboots. I am hoping to find a node that I can take
>>> offline tomorrow so I can actually test the cables, but since this seems to
>>> be happening to any host that reboots it doesn't appear to be a cabling
>>> problem. Can anybody suggest where I should go from here? Is there anything
>>> I can do from a working or non-working host to diagnose the problem? Should
>>> I try rebooting the subnet manager switch? Will that affect the rest of the
>>> fabric? 
>> 
>> Have you spoken to Cisco about the problem?  You say you can log into the
>> "device" (the SM switch?) if so talk to Cisco about how you may be able to
>> restart the SM there.
>> 
>> It does sound like the SM on the switch is failing to transition the links.
>> If you can restart the SM on the switch I would try that first.  Otherwise 
>> yes
>> rebooting the switch is probably your best bet, and yes it will affect the
>> fabric, although I can't say how much without knowing the topology.
>> 
>> Ira
>> 
>>> 
>>> Thanks,
>>> Mike Robbert
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
>>> 
>> 
>> 
>> -- 
>> Ira Weiny
>> Math Programmer/Computer Scientist
>> Lawrence Livermore National Lab
>> 925-423-8008
>> [email protected]
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to