On Dec 16, 2012, at 10:48 PM, Hal Rosenstock wrote:

> On 12/16/2012 8:39 AM, Jens Domke wrote:
>> Hi,
>> 
>> On Dec 16, 2012, at 9:32 PM, Hal Rosenstock wrote:
>> 
>>> Hi,
>>> 
>>> On 12/16/2012 7:03 AM, Jens Domke wrote:
>>>> Hello Hal,
>>>> 
>>>> On Dec 15, 2012, at 5:44 AM, Hal Rosenstock wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On 12/14/2012 3:32 PM, Jens Domke wrote:
>>>>>> Hello Hal,
>>>>>> 
>>>>>> On Dec 15, 2012, at 3:58 AM, Hal Rosenstock wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> On 12/14/2012 1:24 PM, Jens Domke wrote:
>>>>>>>> Hello Hal,
>>>>>>>> 
>>>>>>>> On Dec 15, 2012, at 1:42 AM, Hal Rosenstock wrote:
>>>>>>>> 
>>>>>>>>> Hi again,
>>>>>>>>> 
>>>>>>>>> On 12/14/2012 10:17 AM, Jens Domke wrote:
>>>>>>>>>> Hello Hal,
>>>>>>>>>> 
>>>>>>>>>> thank you for the fast response. I will try to clarify some points.
>>>>>>>>>> 
>>>>>>>>>>>> d) OpenMPI runs are executed with "--mca 
>>>>>>>>>>>> btl_openib_ib_path_record_service_level 1"
>>>>>>>>>>> 
>>>>>>>>>>> I'm not familiar with what DFSSSP does to figure out SLs exactly but
>>>>>>>>>>> there should be no need to set this. The proper SL for querying the 
>>>>>>>>>>> SA
>>>>>>>>>>> for PathRecords, etc. is always in PortInfo.SMSL. In the case of 
>>>>>>>>>>> DFSSSP
>>>>>>>>>>> (and other QoS based routing algorithms), it calculates that and 
>>>>>>>>>>> the SM
>>>>>>>>>>> pushes this into each port. That should be used. It's possible that 
>>>>>>>>>>> SL1
>>>>>>>>>>> is not a valid SL for port <-> SA querying using DFSSSP.
>>>>>>>>>> The OpenMPI parameter btl_openib_ib_path_record_service_level does 
>>>>>>>>>> not specify the SL for querying the PathRecords.
>>>>>>>>>> It just enables the functionality. And the ompi processes use the 
>>>>>>>>>> PortInfo.SMSL to send the request.
>>>>>>>>>> For the request "port -> SA" every 0<=SL<=7 was used in the test, 
>>>>>>>>>> and the SA received the requests.  
>>>>>>>>>>> 
>>>>>>>>>>>> e) kernel 2.6.32-220.13.1.el6.x86_64
>>>>>>>>>>>> 
>>>>>>>>>>>> As far as I understand the whole system:
>>>>>>>>>>>> 1. the OMPI processes are sending MAD requests 
>>>>>>>>>>>> (SubnAdmGet:PathRecord) to the OpenSM
>>>>>>>>>>>> 2. the SA receives the request on QP1
>>>>>>>>>>> 
>>>>>>>>>>> There is the SL in the query itself. This should be the SMSL that 
>>>>>>>>>>> the SM
>>>>>>>>>>> set for that port.
>>>>>>>>>> Hmm, there you might have a point. I think I saw that the query 
>>>>>>>>>> itself had SL=0 specified.
>>>>>>>>>> In fact OpenMPI sets everthing to 0 except for slid and dlid.
>>>>>>>>>>> 
>>>>>>>>>>>> 3. SA asks the routing algorithm (like LASH, DFSSSP or Torus_2QoS) 
>>>>>>>>>>>> about a special service level for the slid/dlid path
>>>>>>>>>>> 
>>>>>>>>>>> This is a (potentially) different SL (for MPI<->MPI port 
>>>>>>>>>>> communication)
>>>>>>>>>>> than the one the query used and is the one returned inside the
>>>>>>>>>>> PathRecord attribute/data.
>>>>>>>>>> Yes, it can be different, but DFSSSP sets the same SL, because the 
>>>>>>>>>> SM is running on a port which is also used for MPI comm.
>>>>>>>>> 
>>>>>>>>> With DFSSSP are all SLs same from source port to get to any 
>>>>>>>>> destination ?
>>>>>>>> No, not necessarily. In general DFSSSP does not enforce SL(LID1->LID2) 
>>>>>>>> == SL(LID2->LID1) or SL(LID1->LID2) == SL(LID1->LID3).
>>>>>>> 
>>>>>>> If SL(LID1->LID2) != SL(LID2->LID1), that's not a reversible path.
>>>>>> True. But i don't think that the SA asks the DFSSSP routing about the SL 
>>>>>> for the reversible path.
>>>>>> So, the SA could use any SL which is a valid SL, even if the DFSSSP 
>>>>>> would recommend another SL.
>>>>>> 
>>>>>> I just read the IB Specs and it says, that "SL specified in the received 
>>>>>> packet is used as the SL in the response packet" for MAD packets.
>>>>>> So, its most likely, that there is a mismatch in the way how OMPI does 
>>>>>> the setup of the PathRequest and the way how the SA does build the 
>>>>>> respond packet.
>>>>>> OMPI always specifies SL=0 (lets say SL_a) inside of the PathRequest 
>>>>>> packet, 
>>>>> 
>>>>> So CompMask in the query has the SL bit on and SL is set to 0 inside the
>>>>> SubAdmGet of PatchRecord ?
>>>> 
>>>> No, the CompMask didn't had the SL bit and the SL was set to 0.
>>> 
>>> That means the SL in the request is wildcarded so the SA/SM fills in a
>>> valid one in the response.
>> Ok.
>>> 
>>>> I tried to follow the path of the SL bit (IB_PR_COMPMASK_SL) and the only 
>>>> reference I found was in osm_sa_path_record.c
>>>> The SA just treats the SL in the PathRequest as a "I would like to use 
>>>> this SL" in case the SL bit is set.
>>>> But the routing engine can overwrite the requested SL before the reply is 
>>>> send.
>>>> 
>>>> Nevertheless, I have changed the code of OMPI so that it sets the SL bit 
>>>> in the CompMask and sets the SL to SMSL for the PathRequest, so that SL_a 
>>>> == SL_b.
>>>> Sadly, the reply send by the SA does not leave the node (for SL_b>0). Only 
>>>> if I change the SL to 0 in the MAD right before umad_send is called by the 
>>>> SA, the paket is able to leave the node and reaches the OMPI process.
>>> 
>>> Are you sure the response doesn't leave the SA node or it's not received
>>> at the requester (OMPI node) ?
>> No, I'm not sure. Is there any possibility to check that? As far as I know, 
>> ibdump does not show MAD pakets which leave a port, it only shows the pakets 
>> when they are received on the other end.
>>> 
>>>> 
>>>>> 
>>>>>> and sends the packet on SL_b (PortInfo.SMSL).
>>>>> 
>>>>> Good.
>>>>> 
>>>>>> The SA uses p_mad_addr->addr_type.gsi.service_level, which is SL_b, for 
>>>>>> the response.
>>>>>> If SL_b is not 0, then the packet can't reach the OMPI process. Right?
>>>>> 
>>>>> Depends. It may be that both SLs work but maybe not.
>>>>> 
>>>>>> If I analyse this correctly, then there are two bugs. One is in OMPI, 
>>>>>> that it does not specify the SL within the PathRequest in a appropriate 
>>>>>> way (which would be a SL suggested by DFSSSP for the reversible path). 
>>>>>> And the second bug is that the SA uses the SL, on which the PathRequest 
>>>>>> packet was send, and not the SL specified within the packet.
>>>>>> What do you think?
>>>>> 
>>>>> Yes, it might be better to wildcard the SL in the query. The only
>>>>> scenario that would fail with the query you are making if there's no SL
>>>>> 0 path between the src/dest LIDs or GIDs in the OMPI PathRecord query.
>>>>> If that's the case, SA should return MAD status 0xc (status code 3 -
>>>>> ERR_NO_RECORDS). But the response doesn't make it back to the requester
>>>>> OMPI node so it's not even getting that far.
>>>> 
>>>> Yes, exactly. So, do you have an idea why the response hands in the SA 
>>>> node?
>>>> I have no inside of the underlying layer (kernel driver and fireware). 
>>>> Maybe there are some implementations, which prevent the SA from sending 
>>>> MADs back on SL>0?
>>> 
>>> If you're sure this response doesn't get out of the SA node, please
>>> contact Mellanox support with the details.
>> Ok, I can do this, if it turns out to be true.
>>> 
>>>>> 
>>>>>> I can try to change the PathRequest of OMPI tomorrow, so that it matches 
>>>>>> addr_type.gsi.service_level.
>>>>>> Maybe, with this change the packets of the SA will reach the OMPI 
>>>>>> process on a SL>0.
>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 4. SA sends the PathRecord back to the OMPI process via umad_send 
>>>>>>>>>>>> in libvendor/osm_vendor_ibumad.c
>>>>>>>>>>> 
>>>>>>>>>>> By the response reversibility rule, I think this is returned on the 
>>>>>>>>>>> SL
>>>>>>>>>>> of the original query but haven't verified this in the code base 
>>>>>>>>>>> yet.
>>>>>>>>>> Ok, I was not aware of that rule. But if this is true, then the SA 
>>>>>>>>>> should also be able to send via SL>0.
>>>>>>>>> 
>>>>>>>>> I doubled checked and indeed the SA response does use the SL that the
>>>>>>>>> incoming request was received on.
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> The osm_vendor_send() function builds the MAD packet with the 
>>>>>>>>>>>> following attributes:
>>>>>>>>>>>>   /* GS classes */
>>>>>>>>>>>>   umad_set_addr_net(p_vw->umad, p_mad_addr->dest_lid,
>>>>>>>>>>>>                     p_mad_addr->addr_type.gsi.remote_qp,
>>>>>>>>>>>>                     p_mad_addr->addr_type.gsi.service_level,
>>>>>>>>>>>>                     IB_QP1_WELL_KNOWN_Q_KEY);
>>>>>>>>>>>> So, the SL is the same like the one which was used by the OMPI 
>>>>>>>>>>>> process. The Q_Key matches the Q_key on the OMPI process, and 
>>>>>>>>>>>> remote_qp and dest_lid is correct, too.
>>>>>>>>>>>> Afterwards umad_send(…) is used to send the reply with the 
>>>>>>>>>>>> PathRecord, and this send does not work (except for SL=0).
>>>>>>>>>>> 
>>>>>>>>>>> By not working, what do you mean ? Do you mean it's not received at 
>>>>>>>>>>> the
>>>>>>>>>>> requester with no message in the OpenSM log or not received at the
>>>>>>>>>>> OpenSM or something else ? It could be due to the wrong SL being 
>>>>>>>>>>> used in
>>>>>>>>>>> the original request (forcing it to SL 1). That could cause it not 
>>>>>>>>>>> to be
>>>>>>>>>>> received at the SM or the response not to make it back to the 
>>>>>>>>>>> requester
>>>>>>>>>>> from the SA if the SL used is not "reversible".
>>>>>>>>>> By "not working" I mean, that the MPI process does not receive any 
>>>>>>>>>> response from the SA.
>>>>>>>>>> I get messages from the MPI process like the following:
>>>>>>>>>> [rc011][[14851,1],1][connect/btl_openib_connect_sl.c:301:get_pathrecord_info]
>>>>>>>>>>  No response from SA after 20 retries
>>>>>>>>>> The log of OpenSM shows that the SA received the PathRequest query, 
>>>>>>>>>> dumps the query into the log, and sends the reply back.
>>>>>>>>>> And I think I was some messages in the log about "…1 outstanding 
>>>>>>>>>> MAD…".
>>>>>>>>>>> 
>>>>>>>>>>>> If I look into the MAD before it is send, then it looks like this:
>>>>>>>>>>>> Breakpoint 2, umad_send (fd=9, agentid=2, umad=0x7fffe8012530, 
>>>>>>>>>>>> length=120, timeout_ms=0, retries=3)
>>>>>>>>>>>> at src/umad.c:791
>>>>>>>>>>>> 791             if (umaddebug > 1)
>>>>>>>>>>>> (gdb) p *mad
>>>>>>>>>>>> $1 = {agent_id = 2, status = 0, timeout_ms = 0, retries = 3, 
>>>>>>>>>>>> length = 0, addr = {qpn = 1325427712, qkey = 384, 
>>>>>>>>>>>> lid = 4096, sl = 6 '\006', path_bits = 0 '\000', grh_present = 0 
>>>>>>>>>>>> '\000', gid_index = 0 '\000', 
>>>>>>>>>>>> hop_limit = 0 '\000', traffic_class = 0 '\000', gid = '\000' 
>>>>>>>>>>>> <repeats 15 times>, flow_label = 0, 
>>>>>>>>>>>> pkey_index = 0, reserved = "\000\000\000\000\000"}, data = 
>>>>>>>>>>>> 0x7fffe8012530 "\002"}
>>>>>>>>>>> 
>>>>>>>>>>> Is this the PathRecord query on the OpenMPI side or the response on 
>>>>>>>>>>> the
>>>>>>>>>>> OpenSM side ? SL is 6 rather than 1 here.
>>>>>>>>>> This is the response on the OpenSM side (inside the umad_send 
>>>>>>>>>> function, right before it is written to the device with write(fd, …).
>>>>>>>>>> SL=6 indicates, that the MPI process was sending the request on SL 6.
>>>>>>>>> 
>>>>>>>>> What is SMSL for the requester ? Was it SL 6 ?
>>>>>>>> Yes, it was SL 6.
>>>>>>>> Here is a content of a similar packet which was received by the SA. I 
>>>>>>>> have used ibdump on the port where the OpenSM was running:
>>>>>>>> ======================================================================================
>>>>>>>> No.     Time        Source                Destination           
>>>>>>>> Protocol Length Info
>>>>>>>> 785 14.352168   LID: 384              LID: 4140             InfiniBand 
>>>>>>>> 290    UD Send Only SubnAdmGet(PathRecord)
>>>>>>>> 
>>>>>>>> Frame 785: 290 bytes on wire (2320 bits), 290 bytes captured (2320 
>>>>>>>> bits)
>>>>>>>> Arrival Time: Dec 13, 2012 18:09:44.437633332 JST
>>>>>>>> Epoch Time: 1355389784.437633332 seconds
>>>>>>>> [Time delta from previous captured frame: 4.332020528 seconds]
>>>>>>>> [Time delta from previous displayed frame: 4.332020528 seconds]
>>>>>>>> [Time since reference or first frame: 14.352168681 seconds]
>>>>>>>> Frame Number: 785
>>>>>>>> Frame Length: 290 bytes (2320 bits)
>>>>>>>> Capture Length: 290 bytes (2320 bits)
>>>>>>>> [Frame is marked: False]
>>>>>>>> [Frame is ignored: False]
>>>>>>>> [Protocols in frame: erf:infiniband]
>>>>>>>> Extensible Record Format
>>>>>>>> [ERF Header]
>>>>>>>>     Timestamp: 0x50c99b587008bcf2
>>>>>>>>     [Header type]
>>>>>>>>         .001 0101 = type: INFINIBAND (21)
>>>>>>>>         0... .... = Extension header present: 0
>>>>>>>>     0000 0100 = flags: 4
>>>>>>>>         .... ..00 = capture interface: 0
>>>>>>>>         .... .1.. = varying record length: 1
>>>>>>>>         .... 0... = truncated: 0
>>>>>>>>         ...0 .... = rx error: 0
>>>>>>>>         ..0. .... = ds error: 0
>>>>>>>>         00.. .... = reserved: 0
>>>>>>>>     record length: 306
>>>>>>>>     loss counter: 0
>>>>>>>>     wire length: 290
>>>>>>>> InfiniBand
>>>>>>>> Local Route Header
>>>>>>>>     0110 .... = Virtual Lane: 0x06
>>>>>>>>     .... 0000 = Link Version: 0
>>>>>>>>     0110 .... = Service Level: 6
>>>>>>>>     .... 00.. = Reserved (2 bits): 0
>>>>>>>>     .... ..10 = Link Next Header: 0x02
>>>>>>>>     Destination Local ID: 19
>>>>>>>>     0000 0... .... .... = Reserved (5 bits): 0
>>>>>>>>     .... .000 0100 1000 = Packet Length: 72
>>>>>>>>     Source Local ID: 16
>>>>>>>> Base Transport Header
>>>>>>>>     Opcode: 100
>>>>>>>>     1... .... = Solicited Event: True
>>>>>>>>     .1.. .... = MigReq: True
>>>>>>>>     ..00 .... = Pad Count: 0
>>>>>>>>     .... 0000 = Header Version: 0
>>>>>>>>     Partition Key: 65535
>>>>>>>>     Reserved (8 bits): 0
>>>>>>>>     Destination Queue Pair: 0x000001
>>>>>>>>     0... .... = Acknowledge Request: False
>>>>>>>>     .000 0000 = Reserved (7 bits): 0
>>>>>>>>     Packet Sequence Number: 0
>>>>>>>> DETH - Datagram Extended Transport Header
>>>>>>>>     Queue Key: 2147549184
>>>>>>>>     Reserved (8 bits): 0
>>>>>>>>     Source Queue Pair: 0x00380050
>>>>>>>> MAD Header - Common Management Datagram
>>>>>>>>     Base Version: 0x01
>>>>>>>>     Management Class: 0x03
>>>>>>>>     Class Version: 0x02
>>>>>>>>     Method: Get() (0x01)
>>>>>>>>     Status: 0x0000
>>>>>>>>     Class Specific: 0x0000
>>>>>>>>     Transaction ID: 0x0010000f38005000
>>>>>>>>     Attribute ID: 0x0035
>>>>>>>>     Reserved: 0x0000
>>>>>>>>     Attribute Modifier: 0x00000000
>>>>>>>>     MAD Data Payload: 
>>>>>>>> 000000000000000000000000000000000000000000000000...
>>>>>>>>  Illegal RMPP Type (0)! 
>>>>>>>>     RMPP Type: 0x00
>>>>>>>>     RMPP Type: 0x00
>>>>>>>>     0000 .... = R Resp Time: 0x00
>>>>>>>>     .... 0000 = RMPP Flags: Unknown (0x00)
>>>>>>>>     RMPP Status:  (Normal) (0x00)
>>>>>>>>     RMPP Data 1: 0x00000000
>>>>>>>>     RMPP Data 2: 0x00000000
>>>>>>>> SMASubnAdmGet(PathRecord)
>>>>>>>>     SM_Key (Verification Key): 0x0000000000000000
>>>>>>>>     Attribute Offset: 0x0000
>>>>>>>>     Reserved: 0x0000
>>>>>>>>     Component Mask: 0x0000003000000000
>>>>>>>>     Attribute (PathRecord)
>>>>>>>>         PathRecord
>>>>>>>>             DGID: :: (::)
>>>>>>>>             SGID: ::0.15.0.16 (::0.15.0.16)
>>>>>>>>             DLID: 0x0000
>>>>>>>>             SLID: 0x0000
>>>>>>>>             0... .... = RawTraffic: 0x00
>>>>>>>>             .... 0000 0000 0000 0000 0000 = FlowLabel: 0x000000
>>>>>>>>             HopLimit: 0x00
>>>>>>>>             TClass: 0x00
>>>>>>>>             0... .... = Reversible: 0x00
>>>>>>>>             .000 0000 = NumbPath: 0x00
>>>>>>>>             P_Key: 0x0000
>>>>>>>>             .... .... .... 0000 = SL: 0x0000
>>>>>>>>             00.. .... = MTUSelector: 0x00
>>>>>>>>             ..00 0000 = MTU: 0x00
>>>>>>>>             00.. .... = RateSelector: 0x00
>>>>>>>>             ..00 0000 = Rate: 0x00
>>>>>>>>             00.. .... = PacketLifeTimeSelector: 0x00
>>>>>>>>             ..00 0000 = PacketLifeTime: 0x00
>>>>>>>>             Preference: 0x00
>>>>>>>> Variant CRC: 0xad4e
>>>>>>>> ======================================================================================
>>>>>>> 
>>>>>>> And the SubnAdmGetResp(PathRecord) is not seen ? If not, it doesn't get
>>>>>>> out that machine and the issue is internal to that machine. It could be
>>>>>>> because of the underlying issue which hangs OpenSM when some IB program
>>>>>>> tried to unregister from the MAD layer but there were outstanding work
>>>>>>> completions. That's based on your original email earlier this AM.
>>>>>> No, the SubnAdmGetResp does not show up, if I use ibdump on the OMPI 
>>>>>> side and the SA uses a SL>0.
>>>>> 
>>>>> Can ibdump be used to capture output on the SM port ?
>>>> 
>>>> Yes, that works quite well, despite the warning in the ibdump manual.
>>>> But I have started ibdump before opensm, maybe that makes a difference, 
>>>> not sure.
>>>> 
>>>> Regards,
>>>> Jens
>>>> 
>>>> PS: I have seen a small bug. Not sure if its a bug in wireshark or ibdump, 
>>>> but the response received by the OMPI node isn't shown correctly. The 
>>>> PathRecord contains an offset which is either missing in the dump or is 
>>>> not treated correctly be wireshark. But it causes wireshark to show the 
>>>> PathRecord data with wrong values.
>>>> Maybe you could redirect this to the developer of ibdump, so that he can 
>>>> check/fix it.
>>> 
>>> Are you referring to the fields after the SA AttributeOffset or
>>> something else ?
>> Yes, after the SMASubnAdmGet Attribute Offset. Here an example:
>> I get on the OMPI side:
>>    SMASubnAdmGetResp(PathRecord)
>>        SM_Key (Verification Key): 0x0000000000000000
>>        Attribute Offset: 0x0008
>>        Reserved: 0x0000
>>        Component Mask: 0x0000803000000000
>>        Attribute (PathRecord)
>>            PathRecord
>>                DGID: ::8:f104:399:ebb5:fe80:0 (::8:f104:399:ebb5:fe80:0)
>>                SGID: ::8:f104:399:ecd5:4:8 (::8:f104:399:ecd5:4:8)
>>                DLID: 0x0000
>>                SLID: 0x0000
>>                0... .... = RawTraffic: 0x00
>>                .... 0000 1000 0000 1111 1111 = FlowLabel: 0x0080ff
>>                HopLimit: 0xff
>>                TClass: 0x00
>>                0... .... = Reversible: 0x00
>>                .000 0011 = NumbPath: 0x03
>>                P_Key: 0x8486
>>                .... .... .... 0000 = SL: 0x0000
>>                00.. .... = MTUSelector: 0x00
>>                ..00 0000 = MTU: 0x00
>>                00.. .... = RateSelector: 0x00
>>                ..00 0000 = Rate: 0x00
>>                00.. .... = PacketLifeTimeSelector: 0x00
>>                ..00 0000 = PacketLifeTime: 0x00
>>                Preference: 0x00
>> 
>> But it should show (see the difference in SLID, DLID, SL which are now 
>> correct):
>>    SMASubnAdmGetResp(PathRecord)
>>        SM_Key (Verification Key): 0x0000000000000000
>>        Attribute Offset: 0x0008
>>        Reserved: 0x0000
>>        Component Mask: 0x0000803000000000
>>        Attribute (PathRecord)
>>            PathRecord
>>                DGID: ::8:f104:399:ebb5 (::8:f104:399:ebb5)
>>                SGID: fe80::8:f104:399:ecd5 (fe80::8:f104:399:ecd5)
>>                DLID: 0x0004
>>                SLID: 0x0008
>>                0... .... = RawTraffic: 0x00
>>                .... 0000 0000 0000 0000 0000 = FlowLabel: 0x000000
>>                HopLimit: 0x00
>>                TClass: 0x00
>>                1... .... = Reversible: 0x01
>>                .000 0000 = NumbPath: 0x00
>>                P_Key: 0xffff
>>                .... .... .... 0011 = SL: 0x0003
>>                10.. .... = MTUSelector: 0x02
>>                ..00 0100 = MTU: 0x04
>>                10.. .... = RateSelector: 0x02
>>                ..00 0110 = Rate: 0x06
>>                10.. .... = PacketLifeTimeSelector: 0x02
>>                ..01 0010 = PacketLifeTime: 0x12
>>                Preference: 0x00
> 
> 
> I think everything after AttributeOffset is off by 2 bytes. DGID doesn't
> look right to me (no subnet prefix fe80:: in front of GUID).

Yes, I made a small mistake with the hexeditor. I started the shift after the 
subnet prefix.
Sorry for the confusion.

Thank you for the hint with smpquery and saquery, I will check that tomorrow.

Jens

> 
> -- Hal
> 
>> 
>> Regards,
>> Jens
>> 
>>> 
>>> -- Hal
>>> 
>>>>> 
>>>>> -- Hal
>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> One would need to walk the SLToVLMappingTables from requester (OMPI
>>>>>>>>> port) to SA and back to see whether SL6 would even have a chance of
>>>>>>>>> working (not dropping) aside from whether it's really the correct SL 
>>>>>>>>> to use.
>>>>>>>> All SL2VL tables look the same. I checked the output of OpenSM.
>>>>>>>>        SL: |  0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 
>>>>>>>> 11 | 12 | 13 | 14 | 15 |
>>>>>>>>        VL: | 0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |0x0 |0x1 |0x2 
>>>>>>>> |0x3 |0x4 |0x5 |0x6 |0x7 |
>>>>>>>> But this is also as expected, because I have set the QoS in the opensm 
>>>>>>>> config as follows:
>>>>>>>>        qos_sl2vl 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
>>>>>>>> This was set for "default", "CA" and "Switch external ports". I have 
>>>>>>>> not touched the config for "Switch Port 0" and "Router ports", they 
>>>>>>>> remained: qos_[sw0 | rtr]_sl2vl (null)
>>>>>>> 
>>>>>>> That works as long as all links have (at least) 8 data VLs (VLCap 4).
>>>>>> Yes, all VL_CAP show 4 in the OpenSM log file.
>>>>>> 
>>>>>> Regards
>>>>>> Jens
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> -- Hal
>>>>>>> 
>>>>>>>> Regards
>>>>>>>> Jens
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- Hal
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> The output of OpenMPI or OpenSM's log file don't show any useful 
>>>>>>>>>>>> information for this problem, even with higher debug levels.
>>>>>>>>>>> 
>>>>>>>>>>> So nothing interesting logged relative to the PathRecord queries ?
>>>>>>>>>> In the OpenSM log, only that it was received, how the request looks 
>>>>>>>>>> like, and that it was send back.
>>>>>>>>>> And a few "outstanding MADs" a few lines later in the log.
>>>>>>>>>>> 
>>>>>>>>>>>> So, right now I'm stuck, and have no idea if there is an error in 
>>>>>>>>>>>> the kernel driver, the HCA firmware or something completely 
>>>>>>>>>>>> different. Or if umad_send basically does not support SL>0.
>>>>>>>>>>>> A workaround for the moment is to set the SL in the 
>>>>>>>>>>>> umad_set_addr_net(...) call to 0.
>>>>>>>>>>> 
>>>>>>>>>>> So SL 0 works between all nodes and SA for querying/responses. 
>>>>>>>>>>> Wonder if
>>>>>>>>>>> that's how SMSL is set by DFSSSP.
>>>>>>>>>> No, the SMSL set by DFSSSP is different from 0, I have checked this. 
>>>>>>>>>> In our case (OpenSM running on a compute node), it sets the same SL, 
>>>>>>>>>> which is used
>>>>>>>>> for MPI<->MPI traffic, to ensure deadlock freedom.
>>>>>>>>>> 
>>>>>>>>>> Regards
>>>>>>>>>> Jens
>>>>>>>>>> 
>>>>>>>>>> --------------------------------
>>>>>>>>>> Dipl.-Math. Jens Domke
>>>>>>>>>> Researcher - Tokyo Institute of Technology
>>>>>>>>>> Satoshi MATSUOKA Laboratory
>>>>>>>>>> Global Scientific Information and Computing Center
>>>>>>>>>> 2-12-1-E2-7 Ookayama, Meguro-ku, 
>>>>>>>>>> Tokyo, 152-8550, JAPAN
>>>>>>>>>> Tel/Fax: +81-3-5734-3876
>>>>>>>>>> E-Mail: [email protected]
>>>>>>>>>> --------------------------------
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
>>>>>>>>> in
>>>>>>>>> the body of a message to [email protected]
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>> 
>>>>>>>> --------------------------------
>>>>>>>> Dipl.-Math. Jens Domke
>>>>>>>> Researcher - Tokyo Institute of Technology
>>>>>>>> Satoshi MATSUOKA Laboratory
>>>>>>>> Global Scientific Information and Computing Center
>>>>>>>> 2-12-1-E2-7 Ookayama, Meguro-ku, 
>>>>>>>> Tokyo, 152-8550, JAPAN
>>>>>>>> Tel/Fax: +81-3-5734-3876
>>>>>>>> E-Mail: [email protected]
>>>>>>>> --------------------------------
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>>>> the body of a message to [email protected]
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> 
>>>>>> --------------------------------
>>>>>> Dipl.-Math. Jens Domke
>>>>>> Researcher - Tokyo Institute of Technology
>>>>>> Satoshi MATSUOKA Laboratory
>>>>>> Global Scientific Information and Computing Center
>>>>>> 2-12-1-E2-7 Ookayama, Meguro-ku, 
>>>>>> Tokyo, 152-8550, JAPAN
>>>>>> Tel/Fax: +81-3-5734-3876
>>>>>> E-Mail: [email protected]
>>>>>> --------------------------------
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> --------------------------------
>>>> Dipl.-Math. Jens Domke
>>>> Researcher - Tokyo Institute of Technology
>>>> Satoshi MATSUOKA Laboratory
>>>> Global Scientific Information and Computing Center
>>>> 2-12-1-E2-7 Ookayama, Meguro-ku, 
>>>> Tokyo, 152-8550, JAPAN
>>>> Tel/Fax: +81-3-5734-3876
>>>> E-Mail: [email protected]
>>>> --------------------------------
>>>> 
>>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> 
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to