We have a linux cluster running RH5.3 with ofed1.4 using Mellanox MT25418. The cluster is attached to a sun solaris10.7 thumper box. The thumper box export a zfs filesystem via NFS. linux clients mount the filesystem via IPoIB.
 
Under filesystem I/O load the subnet manager gets repeated path record requests from the sun solaris box. This can bring the SM and the fabric down.  Any any one else had issue with solaris IB <-> Linux IB? Any insight into what could be causing the issue?
 
Thanks,
Mahmoud
 
----

Oct 15 19:37: 59 952368 [41E02960] 0x08 -> PathRecord dump:

service id ..............0x0000000000000000

dgid .................... Oxfe80000000000000 : 0x00237dffff949819

sgid .................... Oxfe80000000000000 : 0x0003ba000100d0a5

dlid .................... 0

slid .................... 0

hop_flow_raw............ OxO

tclass .................. OxO

num_path_revers......... Ox81

pkey .................... 0x0

qos_class ............... OxO

sl ......................OxO

mtu .....................OxO

rate .................... OxO

pkt_life ................0x0

preference .............. 0x0

resv2 ................... OxO

resv3 ................... OxO

Oct 15 19:37:59 952376 [41E02960) 0x08 -> osm_pr_rcv_process: Unicast

destination requested

Oct 15 19:37:59 952382 [41E02960] 0x08 ->

osm_pr_rcv_get_port pair_paths: Src port 0x0003ba000100d0a5, Dst port

0x00237dffff949819

Oct 15 19:37:59 952388 [41E02960] 0x08 ->

_osm_pr_rcv_get_port_pair_paths: Src LIDs [2 - 2], Dest LIDs [67-67]

Oct 15 19:37:59 952393 [41E02960] 0x08 ->

_osm pr_rcv_get_lid_pair_path: Src LID 2, Dest LID 67

Oct 15 19:37:59 952399 [41E02960] 0x08 -> _osm_pr_rcv_get-path_parms:

Path min MTU = 4, min rate = 6

Oct 15 19:37:59 952408 [41E02960] 0x08 - > _osm_pr_rcv_get-path_parms:

Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = OxFFFF, sl

= 0

Oct 15 19:37:59 952417 [41E02960] 0x08 - > _osm_pr_rcv_get_path_parms:

Path min MTU = 4, min rate = 6

Oct 15 19: 37:59 952423 [41E02960] 0x08 -> osm pr_rcv_get_path parms:

Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = OxFFFF, sl

= 0

Oct 15 19:37:59 952428 [41E02960] 0x08-> osm_sa_respond: Returning 1

records

Oct 15 19:37:59 952433 [41E02960] 0x08 - > osm_vendor_get: Acquiring UMAD

for p_madw = 0x2a9567f2c8, size = 120

Oct 15 19:37:59 952439 [41E02960] 0x08 -> osm_vendor_get: Acquired UMAD

0x2a9567f390, size = 120

Oct 15 19:37:59 952455 [41E02960] 0x08 - > osm_vendor_put: Retiring UMAD

0x2a9567f390

Oct 15 19:37:59 952460 [41E02960] 0x08 -> •.osm_vendor_send: Completed

sending response or unsolicited p_madw'"j= Ox2a9567f2b0

Oct 15 19:37:59 952466 [41E02960] 0x08 -> osm_vendor_put: Retiring UMAD

0x724520


===============

Loading IBDIAGNET from: /usr/1ib64 / ibdiagnetl.2

-W- Topology file is not specified.

Reports regarding cluster links will use direct routes.

Loading IBDM from: /usr/lib64 / ibdml.2

- I- Using port 1 as the local port.

- I- Discovering ... 103 nodes (7 Switches & 96 CA- s) discovered.

-I ---------------------------------------------------

- I- Bad Guids /LIDs Info

-I -------------------------------------------------- -

-I- No bad Guids were found

-I -------------------------------------------------- -

-I- Links With Logical State = INIT

-I -------------------------------------------------- -

-I- No bad Links (with logical state = INIT) were found

-I ---------------------------------------------------

-I- PM Counters Info

-I -------------------------------------------------- -

-I- No illegal PM counters values were found

-I ---------------------------------------------------

-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)

-I ---------------------------------------------------

-I- PKey:Ox7fff Hosts:97 full:97 partial:0

-I -------------------------------------------------- -

-I- IPoIB Subnets Check

-I ---------------------------------------------------

-I- Subnet: IPv4 PKey:Ox7fff QKey:Ox00000blb MTU:2048Byte rate:lOGbps

SL:OxOO

-W- Suboptimal rate for group. Lowest member rate:20Gbps > grouprate:

lOGbps

-I ---------------------------------------------------

-I- Bad Links Info

-I- Errors have occurred on the following links

(for errors details, look in log file / tmp/ibdiagnet.log):

-I ----------------------------------------------------

Link at the end of direct route "1,11,23"

----------------------------------------------------------------

-I- Stages Status Report:

STAGE

Bad GUIDs /LIDS Check

Link State Active Check

Performance Counters Report

Partitions Check

IPoIB Subnets Check

Link Errors Check

Errors Warnings

0 0

0 0

0 0

0 0

0 1

0 0

Please see /tmp/ibdiagnet.log for complete log

- I- Done. Run time was 6 seconds.


_______________________________________________
ewg mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to