On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley <[EMAIL PROTECTED]> wrote: > Hi Eric, > > Thanks for the response. OpenSM is running and set to start on bootup on > MachineB: > ps aux | grep open > root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 > /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 > > The log on Machine B just logs this every 10 seconds: > Nov 25 14:34:21 148541 [477A7940] 0x01 -> > __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal > OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING > Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down > > Ibstat confirms port is in polling state on MachineB.
Is the port in init or down ? > MachineA however is in a bad state, Any additional details on this ? Can you kill/unload all the ib stuff and reload it ? That would be gentler than rebooting. -- Hal >I tried the openibd restart command, it accepted the > command but after 5 minutes shows no progress of doing anything and is > just at the cursor. Is some sort of forced restart of openibd possible? > > Thanks, > > Rob > > > -----Original Message----- > From: Baur, Eric [mailto:[EMAIL PROTECTED] > Sent: 25 November 2008 14:31 > To: Robert Dunkley > Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Robert- > > Is OpenSM set to start on boot? > chkconfig --list | grep opensmd > > If not: chkconfig opensmd on > and: /etc/init.d/opensmd start > > You can also restart openib without rebooting the machines. > /etc/init.d/openibd restart > > -Eric > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Robert > Dunkley > Sent: Tuesday, November 25, 2008 9:21 AM > To: [email protected] > Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Hi everyone, > > I'm using a setup of two machines (Lets call them A and B) directly > connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox > PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 > installed, Machine B runs OpenSM. > > All was working fine. I shutdown Machine A did some maintenance and then > powered it on again, everything is OK again. I then shutdown Machine B > (The one running OpenSM), this seemed to really upset Machine A. After > booting Machine B again, Machine B looks OK with the port down and in > polling state. Machine A however gives the following error if I run > ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > (Resource temporarily unavailable) > > I don't want to reboot Machine A as it must synch data with Machine B > over the Infiniband link first. Does anyone have any idea how to fix > machine A? > > Thanks, > > Rob > > The SAQ Group > > Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > SEMTEC Limited Trading as SAQ is Registered in England & Wales > Company Number: 06481952 > > > > http://www.saqnet.co.uk AS29219 > > SAQ Group Delivers high quality, honestly priced communication and I.T. > services to UK Business. > > DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : > Backups : Managed Networks : Remote Support. > > Find us in http://www.thebestof.co.uk/petersfield > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
