On Fri, Oct 16, 2009 at 2:20 PM, Mahmoud Hanafi <[email protected]> wrote: > We have a linux cluster running RH5.3 with ofed1.4 using Mellanox MT25418. > The cluster is attached to a sun solaris10.7 thumper box. The thumper box > export a zfs filesystem via NFS. linux clients mount the filesystem via > IPoIB. > > Under filesystem I/O load the subnet manager gets repeated path record > requests from the sun solaris box.
Do the path records all look the same or different in terms of destinations (and sources) ? Is the source GUID (0x0003ba000100d0a5) the Solaris thumper port GUID (00-03-BA (hex) Sun Microsystems Inc.) ? The destination appears to be some HP device (00-23-7D (hex) Hewlett Packard). > This can bring the SM and the fabric down. Are you referring to the load due to path requests or something else ? Running the OpenSM are the logging level you appear to be using would certainly slow things down greatly so I presume that was only done to look further into what was going on. > Any any one else had issue with solaris IB <-> Linux IB? I haven't run Solaris <-> Linux IB in several years now but this used to work but there have been a lot of changes. > Any insight into what could be causing the issue? Could you elaborate on the below ? I see one PathRecord response trace and an ibdiagnet run which shows a bad link at direct route 1,11,23 from where that was run. You might want to debug the issue with that link. -- Hal > > Thanks, > Mahmoud > > ---- > > Oct 15 19:37: > > 59 952368 [41E02960] 0x08 -> PathRecord dump: > > service id ..............0x0000000000000000 > > dgid .................... Oxfe80000000000000 : 0x00237dffff949819 > > sgid .................... Oxfe80000000000000 : 0x0003ba000100d0a5 > > dlid .................... 0 > > slid .................... 0 > > hop_flow_raw............ OxO > > tclass .................. OxO > > num_path_revers......... Ox81 > > pkey .................... 0x0 > > qos_class ............... OxO > > sl ......................OxO > > mtu .....................OxO > > rate .................... OxO > > pkt_life ................0x0 > > preference .............. 0x0 > > resv2 ................... OxO > > resv3 ................... OxO > > Oct 15 19:37:59 952376 [41E02960) 0x08 -> osm_pr_rcv_process: Unicast > > destination requested > > Oct 15 19:37:59 952382 [41E02960] 0x08 -> > > osm_pr_rcv_get_port pair_paths: Src port 0x0003ba000100d0a5, Dst port > > 0x00237dffff949819 > > Oct 15 19:37:59 952388 [41E02960] 0x08 -> > > _osm_pr_rcv_get_port_pair_paths: Src LIDs [2 - 2], Dest LIDs [67-67] > > Oct 15 19:37:59 952393 [41E02960] 0x08 -> > > _osm pr_rcv_get_lid_pair_path: Src LID 2, Dest LID 67 > > Oct 15 19:37:59 952399 [41E02960] 0x08 -> _osm_pr_rcv_get-path_parms: > > Path min MTU = 4, min rate = 6 > > Oct 15 19:37:59 952408 [41E02960] 0x08 - > _osm_pr_rcv_get-path_parms: > > Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = OxFFFF, sl > > = 0 > > Oct 15 19:37:59 952417 [41E02960] 0x08 - > _osm_pr_rcv_get_path_parms: > > Path min MTU = 4, min rate = 6 > > Oct 15 19: 37:59 952423 [41E02960] 0x08 -> osm pr_rcv_get_path parms: > > Path params: mtu = 4, rate > > = 6, packet lifetime = 18, pkey = OxFFFF, sl > > = 0 > > Oct 15 19:37:59 952428 [41E02960] 0x08 > > -> osm_sa_respond: Returning 1 > > records > > Oct 15 19:37:59 952433 [41E02960] 0x08 - > >> osm_vendor_get: Acquiring UMAD > > for p_madw = 0x2a9567f2c8, size = 120 > > Oct 15 19:37:59 952439 [41E02960] 0x08 -> osm_vendor_get: Acquired UMAD > > 0x2a9567f390, size = 120 > > Oct 15 19:37:59 952455 [41E02960] 0x08 - > >> osm_vendor_put: Retiring UMAD > > 0x2a9567f390 > > Oct 15 19:37:59 952460 [41E02960] 0x08 -> > > •.osm_vendor_send: Completed > > sending response or unsolicited p_madw'"j= Ox2a9567f2b0 > > Oct 15 19:37:59 952466 [41E02960] 0x08 -> osm > > _vendor_put: Retiring UMAD > > 0x724520 > > =============== > > Loading IBDIAGNET from: /usr/1ib64 > > / ibdiagnetl.2 > > -W- Topology file is not specified. > > Reports regarding cluster links will use direct routes. > > Loading IBDM from: /usr/lib64 / ibdml.2 > > - I- Using port 1 as the local port. > > - I- Discovering ... 103 nodes (7 Switches & 96 CA- s) discovered. > > -I --------------------------------------------------- > > - I- Bad Guids /LIDs Info > > -I -------------------------------------------------- - > > -I- No bad Guids were found > > -I -------------------------------------------------- - > > -I- Links With Logical State = INIT > > -I -------------------------------------------------- - > > -I- No bad Links (with logical state > > = INIT) were found > > -I --------------------------------------------------- > > -I- PM Counters Info > > -I -------------------------------------------------- - > > -I- No illegal PM counters values were found > > -I --------------------------------------------------- > > -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) > > -I --------------------------------------------------- > > -I- PKey:Ox7fff Hosts:97 full:97 partial:0 > > -I -------------------------------------------------- - > > -I- IPoIB Subnets Check > > -I --------------------------------------------------- > > -I- Subnet: IPv4 PKey:Ox7fff QKey:Ox00000blb MTU:2048Byte rate:lOGbps > > SL:OxOO > > -W- Suboptimal rate for group. Lowest member rate:20Gbps > grouprate: > > lOGbps > > -I --------------------------------------------------- > > -I- Bad Links Info > > -I- Errors have occurred on the following links > > (for errors details, look in log file > > / tmp/ibdiagnet.log): > > -I ---------------------------------------------------- > > Link at the end of direct route "1,11,23" > > ---------------------------------------------------------------- > > -I- Stages Status Report: > > STAGE > > Bad GUIDs > > /LIDS Check > > Link State Active Check > > Performance Counters Report > > Partitions Check > > IPoIB Subnets Check > > Link Errors Check > > Errors Warnings > > 0 0 > > 0 0 > > 0 0 > > 0 0 > > 0 1 > > 0 0 > > Please see > > /tmp/ibdiagnet.log for complete log > > - I- Done. Run time was 6 seconds. > > _______________________________________________ > ewg mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > _______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
