I checked another working fabric here and also see the same warnings, so it looks like the warnings are not really a problem.
Well, I assume that it is just IPoIB that isn't working. Since ibping works, I believe that says the IB part is ok. Of course, I can't run any of the perftools since they all need IPoIB to resolve the host IP. Do you have any suggestions of what to check to diagnose the IPoIB problem? Specifically, can you think of any interaction with the "normal" networking stuff in the kernel that might be misconfigured? The reason I mention that is because I rebuilt/installed OFED (no errors/warnings) and it is in its default configuration, which is running well on other similar fabrics here. Therefore I assume the problem must be with the non-OFED stuff. Previously, whenever this kind of problem cropped up it has always been because opensm was not running. I did check that iptables was off, so it isn't a firewall issue. - Chuck On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny <[email protected]> wrote: > On Thu, 2 Sep 2010 11:11:13 -0700 > Chuck Hartley <[email protected]> wrote: > >> Sure, here is the output: >> Note this is with the switch we swapped in, so the port numbers don't >> match the ibchecknet output in the original message. >> >> # ibstat >> CA 'mlx4_0' >> CA type: MT26428 >> Number of ports: 2 >> Firmware version: 2.6.0 >> Hardware version: a0 >> Node GUID: 0x0002c90300032de0 >> System image GUID: 0x0002c90300032de3 >> Port 1: >> State: Active >> Physical state: LinkUp >> Rate: 40 >> Base lid: 6 >> LMC: 0 >> SM lid: 6 > > Well the SM lid is set here. Is it set on the other nodes? > > I don't run ibchecknet usually but I am getting the same errors here on a > working fabric... > > ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this > counter > #warn: Lid is not configured lid 37 port 2 > #warn: SM Lid is not configured > Port check lid 37 port 2: FAILED > > Looking at this output I don't think this is an error. > > 13:17:14 > smpquery nodeinfo 37 > # Node info: Lid 37 > BaseVers:........................1 > ClassVers:.......................1 > NodeType:........................Switch > NumPorts:........................24 > ... > > On switch external Ports the Lid and SMLid are not used. > > Hal, would you concur? > > Chuck, > Is it just that IPoIB is not working for you? > > Ira > > >> Capability mask: 0x0251086a >> Port GUID: 0x0002c90300032de1 >> Port 2: >> State: Down >> Physical state: Polling >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c90300032de2 >> CA 'mthca0' >> CA type: MT25204 >> Number of ports: 1 >> Firmware version: 1.2.0 >> Hardware version: a0 >> Node GUID: 0x003048c64c0c0000 >> System image GUID: 0x003048c64c0c0003 >> Port 1: >> State: Down >> Physical state: Polling >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510a68 >> Port GUID: 0x003048c64c0c0001 >> >> # iblinkinfo >> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies: >> 1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 5 >> 1[ ] " HCA-1" ( ) >> 1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 6 >> 1[ ] "linux70 HCA-1" ( ) >> 1 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 7 >> 1[ ] "linux71 HCA-1" ( ) >> 1 4[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 5[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 6[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 7[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 8[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 9[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 10[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 11[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 12[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 13[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 14[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 15[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 16[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 17[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 18[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 19[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 20[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 21[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 22[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 23[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 24[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 9 >> 1[ ] " HCA-1" ( ) >> 1 25[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 8 >> 1[ ] " HCA-1" ( ) >> 1 26[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 27[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 28[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 29[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 30[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 31[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 32[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 33[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 34[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 35[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> 1 36[ ] ==( 4X 2.5 Gbps Down/ Polling)==> >> [ ] "" ( ) >> >> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <[email protected]> wrote: >> > On Thu, 2 Sep 2010 06:56:50 -0700 >> > Chuck Hartley <[email protected]> wrote: >> > >> >> We swapped in a different switch and see the same errors. The opensm >> >> logfile does not show any errors: >> > >> > Could you run "ibstat" on the node with OpenSM running? >> > >> > And "iblinkinfo" on the same node? >> > >> > Send that output. >> > >> > Ira >> > >> >> >> >> ------------------------------------------------- >> >> OpenSM 3.3.5 >> >> Command Line Arguments: >> >> Daemon mode >> >> Log File: /var/log/opensm.log >> >> ------------------------------------------------- >> >> OpenSM 3.3.5 >> >> >> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5 >> >> Entering DISCOVERING state >> >> >> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000 >> >> pending umads specified >> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state >> >> Using default GUID 0x2c90300032de1 >> >> Entering MASTER state >> >> >> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to >> >> port 0x2c90300032de1 >> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to >> >> port 0x2c90300032de1 >> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting >> >> IS_SM on port 0x0002c90300032de1 >> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state >> >> SUBNET UP >> >> >> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process: >> >> minhop tables configured on all switches >> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP >> >> >> >> >> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <[email protected]> >> >> wrote: >> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <[email protected]> >> >> > wrote: >> >> >> Hello, >> >> >> >> >> >> We installed 1.5.1 and are having problems getting the IB fabric >> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports >> >> >> no errors. However, ibchecknet shows that the switch ports are not >> >> >> being configured. We have never seen this before and are at a loss as >> >> >> to where the problem might be - would someone please point us in the >> >> >> right direction to look? Could it be a problem with the switch >> >> >> itself? Output from ibchecknet below. >> >> >> >> >> >> >> >> >> # ibchecknet >> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all: >> >> >> FAILED >> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so >> >> >> ignore this counter >> >> >> #warn: Lid is not configured lid 3 port 7 >> >> >> #warn: SM Lid is not configured >> >> > >> >> > Is there an SM running on your subnet ? If so, I think that the lack >> >> > of an SM could account for all of the issues mentioned here. >> >> > >> >> > -- Hal >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> >> the body of a message to [email protected] >> >> More majordomo info at http://**vger.kernel.org/majordomo-info.html >> >> >> > >> > >> > -- >> > Ira Weiny >> > Math Programmer/Computer Scientist >> > Lawrence Livermore National Lab >> > 925-423-8008 >> > [email protected] >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to [email protected] >> More majordomo info at http://*vger.kernel.org/majordomo-info.html >> > > > -- > Ira Weiny > Math Programmer/Computer Scientist > Lawrence Livermore National Lab > 925-423-8008 > [email protected] > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
