The reason is:
Jan 01 01:46:17 321555 [58F3E280] -> osm_vendor_set_sm: ERR 5431: setting
IS_SM capability mask failed; errno 2

From the code it looks like  /dev/infiniband/issm<umad_port> needs to be
created and I did that. But still the SM with higher GUID seem to become the
master whenever it does a sweep. The logs are too detailed. So I am sending
snippets.

Local port (with a high GUID)
Jan 01 02:49:56 332142 [5873E280] -> osm_pi_rcv_process: Discovered port num
0x1 with GUID = 0x2c901097682d1 for parent node GUID = 0x2c901097682d0, TID
= 0x1236
Jan 01 02:49:56 332197 [5873E280] -> PortInfo dump:
                               port number.............0x1
                               node_guid...............0x0002c901097682d0
                               port_guid...............0x0002c901097682d1
                               m_key...................0x0000000000000000
                               subnet_prefix...........0xfe80000000000000
                               base_lid................0x1
                               master_sm_base_lid......0x2
                               capability_mask.........0x2510A68
                               diag_code...............0x0
                               m_key_lease_period......0x0
                               local_port_num..........0x1
                               link_width_enabled......0x3
                               link_width_supported....0x3
                               link_width_active.......0x2
                               link_speed_supported....0x1
                               port_state..............ACTIVE
                               state_info2.............0x52
                               m_key_protect_bits......0x0
                               lmc.....................0x0
                               link_speed..............0x11
                               mtu_smsl................0x40
                               vl_cap_init_type........0x40
                               vl_high_limit...........0x0
                               vl_arb_high_cap.........0x8
                               vl_arb_low_cap..........0x8
                               init_rep_mtu_cap........0x4
                               vl_stall_life...........0xFF
                               vl_enforce..............0x40
                               m_key_violations........0x0
                               p_key_violations........0x0
                               q_key_violations........0x0
                               guid_cap................0x20
                               client_reregister.......0x0
                               subnet_timeout..........0x12
                               resp_time_value.........0x10
                               error_threshold.........0x88
Jan 01 02:49:56 332337 [5873E280] -> Capabilities Mask:
                               IB_PORT_CAP_HAS_TRAP
                               IB_PORT_CAP_HAS_AUTO_MIG
                               IB_PORT_CAP_HAS_SL_MAP
                               IB_PORT_CAP_HAS_LED_INFO
                               IB_PORT_CAP_HAS_SYS_IMG_GUID
                               IB_PORT_CAP_HAS_COM_MGT
                               IB_PORT_CAP_HAS_VEND_CLS
                               IB_PORT_CAP_HAS_CAP_NTC
                               IB_PORT_CAP_HAS_CLIENT_REREG

Remote Port which hosts the SM:
Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered port num
0x1 with GUID = 0x2c90109765da1 for parent node GUID = 0x2c90109765da0, TID
= 0x123b
Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered port num
0x1 with GUID = 0x2c90109765da1 for parent node GUID = 0x2c90109765da0, TID
= 0x123b
Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump:
                               port number.............0x1
                               node_guid...............0x0002c90109765da0
                               port_guid...............0x0002c90109765da1
                               m_key...................0x0000000000000000
                               subnet_prefix...........0xfe80000000000000
                               base_lid................0x2
                               master_sm_base_lid......0x2
                               capability_mask.........0x2510A68
                               diag_code...............0x0
                               m_key_lease_period......0x0
                               local_port_num..........0x1
                               link_width_enabled......0x3
                               link_width_supported....0x3
                               link_width_active.......0x2
                               link_speed_supported....0x1
                               port_state..............ACTIVE
                               state_info2.............0x52
                               m_key_protect_bits......0x0
                               lmc.....................0x0
                               link_speed..............0x11
                               mtu_smsl................0x40
                               vl_cap_init_type........0x40
                               vl_high_limit...........0x0
                               vl_arb_high_cap.........0x8
                               vl_arb_low_cap..........0x8
                               init_rep_mtu_cap........0x4
                               vl_stall_life...........0xFF
                               vl_enforce..............0x40
                               m_key_violations........0x0
                               p_key_violations........0x0
                               q_key_violations........0x0
                               guid_cap................0x20
                               client_reregister.......0x0
                               subnet_timeout..........0x12
                               resp_time_value.........0x10
                               error_threshold.........0x88
Jan 01 02:49:56 500831 [5AF3E280] -> Capabilities Mask:
                               IB_PORT_CAP_HAS_TRAP
                               IB_PORT_CAP_HAS_AUTO_MIG
                               IB_PORT_CAP_HAS_SL_MAP
                               IB_PORT_CAP_HAS_LED_INFO
                               IB_PORT_CAP_HAS_SYS_IMG_GUID
                               IB_PORT_CAP_HAS_COM_MGT
                               IB_PORT_CAP_HAS_VEND_CLS
                               IB_PORT_CAP_HAS_CAP_NTC
                               IB_PORT_CAP_HAS_CLIENT_REREG

Please let me know if I look at some specific portion.

Thanks
Ganesh



On 16 May 2007 21:57:27 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:

Hi again Ganesh,

On Wed, 2007-05-16 at 21:42, Ganesh Sadasivan wrote:
> Hi Hal,
>
>  Please see inline.
>
> On 16 May 2007 19:22:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]>
> wrote:
>         Hi Ganesh,
>
>         On Wed, 2007-05-16 at 19:00, Ganesh Sadasivan wrote:
>         > Hi,
>         >
>         >    I have a setup with 2 HCAs connected back to back and am
>         running
>         > opensm (ofed1.1, running at the same priority) on both of
>         them. Is
>         > there any utility to see who is the master?
>
> Even with priority difeferences I am seeing the same behavior.Am I
> missing any option. I am setting "opensm -s 30" and "opensm -s 60" on
> the respective sides.

Why not use the default (10 secs) or at least the same on both sides ?

>         sminfo will show the SM state for a LID/GUID.
>
>
> Thanks.
>
>         >   The smlid in ibv_devinfo, seems to be changing whenever an
>         SM does a
>         > sweep. Is this expected?
>
>         Nope. If they are both at the same priority, the lower GUID
>         should win
>         the SM election.
>
>         Not sure what is going wrong in your (back to back HCA)
>         subnet. Do you
>         ports stay active ?
>
>
> Yes both ports are active.

And they stay active (no LED color changes) ?

If not, can you run both OpenSMs in verbose mode (-V) and see if there
is anything interesting/relevant in the logs ?

-- Hal

> Thanks
> Ganesh
>
>         -- Hal
>
>         > Thanks
>         > Ganesh
>         >
>         >
>
______________________________________________________________________
>         > _______________________________________________
>         > general mailing list
>         > [email protected]
>         >
>         http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>         >
>         > To unsubscribe, please visit
>         http://openib.org/mailman/listinfo/openib-general
>
>


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to