On Thu, 2 Sep 2010 11:11:13 -0700
Chuck Hartley <[email protected]> wrote:

> Sure, here is the output:
> Note this is with the switch we swapped in, so the port numbers don't
> match the ibchecknet output in the original message.
> 
> # ibstat
> CA 'mlx4_0'
>       CA type: MT26428
>       Number of ports: 2
>       Firmware version: 2.6.0
>       Hardware version: a0
>       Node GUID: 0x0002c90300032de0
>       System image GUID: 0x0002c90300032de3
>       Port 1:
>               State: Active
>               Physical state: LinkUp
>               Rate: 40
>               Base lid: 6
>               LMC: 0
>               SM lid: 6

Well the SM lid is set here.  Is it set on the other nodes?

I don't run ibchecknet usually but I am getting the same errors here on a
working fabric...

ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this 
counter
#warn: Lid is not configured lid 37 port 2
#warn: SM Lid is not configured
Port check lid 37 port 2:  FAILED 

Looking at this output I don't think this is an error.

13:17:14 > smpquery nodeinfo 37
# Node info: Lid 37
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Switch
NumPorts:........................24
...

On switch external Ports the Lid and SMLid are not used.

Hal, would you concur?

Chuck,
Is it just that IPoIB is not working for you?

Ira


>               Capability mask: 0x0251086a
>               Port GUID: 0x0002c90300032de1
>       Port 2:
>               State: Down
>               Physical state: Polling
>               Rate: 10
>               Base lid: 0
>               LMC: 0
>               SM lid: 0
>               Capability mask: 0x02510868
>               Port GUID: 0x0002c90300032de2
> CA 'mthca0'
>       CA type: MT25204
>       Number of ports: 1
>       Firmware version: 1.2.0
>       Hardware version: a0
>       Node GUID: 0x003048c64c0c0000
>       System image GUID: 0x003048c64c0c0003
>       Port 1:
>               State: Down
>               Physical state: Polling
>               Rate: 10
>               Base lid: 0
>               LMC: 0
>               SM lid: 0
>               Capability mask: 0x02510a68
>               Port GUID: 0x003048c64c0c0001
> 
> # iblinkinfo
> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
>            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
> 1[  ] " HCA-1" ( )
>            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
> 1[  ] "linux70 HCA-1" ( )
>            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
> 1[  ] "linux71 HCA-1" ( )
>            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
> 1[  ] " HCA-1" ( )
>            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
> 1[  ] " HCA-1" ( )
>            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
> 
> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <[email protected]> wrote:
> > On Thu, 2 Sep 2010 06:56:50 -0700
> > Chuck Hartley <[email protected]> wrote:
> >
> >> We swapped in a different switch and see the same errors. The opensm
> >> logfile does not show any errors:
> >
> > Could you run "ibstat" on the node with OpenSM running?
> >
> > And "iblinkinfo" on the same node?
> >
> > Send that output.
> >
> > Ira
> >
> >>
> >> -------------------------------------------------
> >> OpenSM 3.3.5
> >> Command Line Arguments:
> >>  Daemon mode
> >>  Log File: /var/log/opensm.log
> >> -------------------------------------------------
> >> OpenSM 3.3.5
> >>
> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
> >> Entering DISCOVERING state
> >>
> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
> >> pending umads specified
> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
> >> Using default GUID 0x2c90300032de1
> >> Entering MASTER state
> >>
> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> port 0x2c90300032de1
> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> port 0x2c90300032de1
> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
> >> IS_SM on port 0x0002c90300032de1
> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
> >> SUBNET UP
> >>
> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
> >> minhop tables configured on all switches
> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
> >>
> >>
> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <[email protected]> 
> >> wrote:
> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <[email protected]> 
> >> > wrote:
> >> >> Hello,
> >> >>
> >> >> We installed 1.5.1 and are having problems getting the IB fabric
> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
> >> >> no errors. However, ibchecknet shows that the switch ports are not
> >> >> being configured.  We have never seen this before and are at a loss as
> >> >> to where the problem might be - would someone please point us in the
> >> >> right direction to look?  Could it be a problem with the switch
> >> >> itself? Output from ibchecknet below.
> >> >>
> >> >>
> >> >> # ibchecknet
> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  
> >> >> FAILED
> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
> >> >> ignore this counter
> >> >> #warn: Lid is not configured lid 3 port 7
> >> >> #warn: SM Lid is not configured
> >> >
> >> > Is there an SM running on your subnet ? If so, I think that the lack
> >> > of an SM could account for all of the issues mentioned here.
> >> >
> >> > -- Hal
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to [email protected]
> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > [email protected]
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
[email protected]
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to