On Thu, 2 Sep 2010 11:11:13 -0700 Chuck Hartley <[email protected]> wrote:
> Sure, here is the output: > Note this is with the switch we swapped in, so the port numbers don't > match the ibchecknet output in the original message. > > # ibstat > CA 'mlx4_0' > CA type: MT26428 > Number of ports: 2 > Firmware version: 2.6.0 > Hardware version: a0 > Node GUID: 0x0002c90300032de0 > System image GUID: 0x0002c90300032de3 > Port 1: > State: Active > Physical state: LinkUp > Rate: 40 > Base lid: 6 > LMC: 0 > SM lid: 6 Well the SM lid is set here. Is it set on the other nodes? I don't run ibchecknet usually but I am getting the same errors here on a working fabric... ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this counter #warn: Lid is not configured lid 37 port 2 #warn: SM Lid is not configured Port check lid 37 port 2: FAILED Looking at this output I don't think this is an error. 13:17:14 > smpquery nodeinfo 37 # Node info: Lid 37 BaseVers:........................1 ClassVers:.......................1 NodeType:........................Switch NumPorts:........................24 ... On switch external Ports the Lid and SMLid are not used. Hal, would you concur? Chuck, Is it just that IPoIB is not working for you? Ira > Capability mask: 0x0251086a > Port GUID: 0x0002c90300032de1 > Port 2: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510868 > Port GUID: 0x0002c90300032de2 > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.2.0 > Hardware version: a0 > Node GUID: 0x003048c64c0c0000 > System image GUID: 0x003048c64c0c0003 > Port 1: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x003048c64c0c0001 > > # iblinkinfo > Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies: > 1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 5 > 1[ ] " HCA-1" ( ) > 1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 6 > 1[ ] "linux70 HCA-1" ( ) > 1 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 7 > 1[ ] "linux71 HCA-1" ( ) > 1 4[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 5[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 6[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 7[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 8[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 9[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 10[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 11[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 12[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 13[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 14[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 15[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 16[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 17[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 18[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 19[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 20[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 21[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 22[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 23[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 24[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 9 > 1[ ] " HCA-1" ( ) > 1 25[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 8 > 1[ ] " HCA-1" ( ) > 1 26[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 27[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 28[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 29[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 30[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 31[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 32[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 33[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 34[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 35[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > 1 36[ ] ==( 4X 2.5 Gbps Down/ Polling)==> > [ ] "" ( ) > > On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <[email protected]> wrote: > > On Thu, 2 Sep 2010 06:56:50 -0700 > > Chuck Hartley <[email protected]> wrote: > > > >> We swapped in a different switch and see the same errors. The opensm > >> logfile does not show any errors: > > > > Could you run "ibstat" on the node with OpenSM running? > > > > And "iblinkinfo" on the same node? > > > > Send that output. > > > > Ira > > > >> > >> ------------------------------------------------- > >> OpenSM 3.3.5 > >> Command Line Arguments: > >> Daemon mode > >> Log File: /var/log/opensm.log > >> ------------------------------------------------- > >> OpenSM 3.3.5 > >> > >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5 > >> Entering DISCOVERING state > >> > >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000 > >> pending umads specified > >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state > >> Using default GUID 0x2c90300032de1 > >> Entering MASTER state > >> > >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to > >> port 0x2c90300032de1 > >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to > >> port 0x2c90300032de1 > >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting > >> IS_SM on port 0x0002c90300032de1 > >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state > >> SUBNET UP > >> > >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process: > >> minhop tables configured on all switches > >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP > >> > >> > >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <[email protected]> > >> wrote: > >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <[email protected]> > >> > wrote: > >> >> Hello, > >> >> > >> >> We installed 1.5.1 and are having problems getting the IB fabric > >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports > >> >> no errors. However, ibchecknet shows that the switch ports are not > >> >> being configured. We have never seen this before and are at a loss as > >> >> to where the problem might be - would someone please point us in the > >> >> right direction to look? Could it be a problem with the switch > >> >> itself? Output from ibchecknet below. > >> >> > >> >> > >> >> # ibchecknet > >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all: > >> >> FAILED > >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so > >> >> ignore this counter > >> >> #warn: Lid is not configured lid 3 port 7 > >> >> #warn: SM Lid is not configured > >> > > >> > Is there an SM running on your subnet ? If so, I think that the lack > >> > of an SM could account for all of the issues mentioned here. > >> > > >> > -- Hal > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> the body of a message to [email protected] > >> More majordomo info at http://**vger.kernel.org/majordomo-info.html > >> > > > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > [email protected] > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to [email protected] > More majordomo info at http://*vger.kernel.org/majordomo-info.html > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
