On 8/24/09, Ira Weiny <[email protected]> wrote: > If I send a combined DR path with a start lid but an empty (0 length) DR > path.
Hop Count 0 ? > What is the expected behavior? Not sure what you mean by expected here. Are you referring to expectation based on the spec ? > I know this could be specified with LID routing, but I don't see anywhere > in > the specification which says this is an error. I don't think it should be an error (certainly not for the form you are using LID routed part followed by a DR part) but a null DR part is a little funny/odd. > I do however seem to have 2 > different implementations on 2 different switches. For example: > > I have Switch A (Lid 1) and Switch B (Lid 7). I attempt to query PortInfo > of > Port 1 of each switch using the LID followed by an empty DR path. > > 17:55:22 > ./smpquery -c portinfo 1 0 1 > ibwarn: [21005] mad_rpc: _do_madrpc failed; dport (Lid 1) > ./smpquery: iberror: failed: operation portinfo: port info query failed Is this a timeout ? > 17:55:31 > ./smpquery -c portinfo 7 0 1 > # Port info: Lid 7 port 1 > Mkey:............................0x0000000000000000 > GidPrefix:.......................0x0000000000000000 > ... > <normal output snipped> > > Detecting this special case in libibmad and turning the packet into a LID > routed one Ugh... Is this special case really needed ? I don't think the underlying issue is understood sufficiently yet. > succeeds but I wonder if this is an error in the SMI? Switch SMI ? Is this a proprietary implementation ? > I also notice this is an error on the HCA I am running from (lid 2). Is this HCA node OpenIB based ? 17:57:42 > ./smpquery -c portinfo 2 0 1 > ibwarn: [21008] mad_rpc: _do_madrpc failed; dport (Lid 2) > ./smpquery: iberror: failed: operation portinfo: port info query failed Is this also a timeout ? Also, does the result differ based on where you source these from matter (locally v. remotely)? > Running with a simple DR path works, You're referring to the same DR path here that fails in the combined route examples above, right ? > I guess because this is the loopback case mentioned on page 805. Yes but that's the high level requirement rather than the SMI rules which make that work. > 17:58:16 > ./smpquery -D portinfo 0 1 > # Port info: DR path slid 65535; dlid 65535; 0 port 1 > Mkey:............................0x0000000000000000 > GidPrefix:.......................0x2007000000000000 > ... > <snip> > > It guess that the comment "Since each part may be empty, there are eight > combinations, although only four are really useful:" on line 36 Page 805 > can > be interpreted to mean that only those 4 combinations need to be supported. > Is this true? Not all 4 combinations are supported/known to work. When this was added for ibportstate, the only combined routing form that was important was LID routed part followed by a DR part. > On the other hand I think strictly this should be supported. In an ideal world yes but are they all required or is it just the one form most heavily used ? > Item 4 of C14-9 > (line 24 page 810) requires the SMI to handle the packet if the HopPointer > equals HopCount +1, which it is in my case (HopCount == 0, HopPointer == 1) By handle, this means "The SMI *shall *output the packet on the port whose number is in the entry indexed by Hop Pointer in the Initial Path. If that port number is invalid, the SMI *shall *discard the SMP." Are you sure the Hop Pointer is 1 ? Where do you see this ? If so, what's the initial path at this point (or more specifically index 1 of the initial path) ? I think that needs to be port 0 (if a switch) but this is a little weird as I would think it should be handed to the SMA which is different cases in the spec. > Then after processing by the SMA and doing the required returning initialization the SMI should return the packet as specified in C14-13 > item 3 on line 9 page 812. I'm not sure it would use this case in the case of an empty DR pafh on return. Am I wrong? In the end it does not matter as I have to make the software > work > for all the hardware I have; so I will change the software. IMO it does matter as to where the problem lies (SMI or otherwise) and how the layers are comprised in the implementation. However, I wonder > where exactly the spec falls on this, because I think it will influence > where > the fix resides. If the spec does not allow this then I think it is fine > to > have libibmad return an error since the user specified an invalid combined > DR > path. However, if this should be legal I think libibmad should work around > the bad hardware out there. Is it hardware or firmware that needs fixing ? I think it may depend on the specific workaround for this as to whether it is acceptable as it might harm something else or might violate the spec. -- Hal Thoughts? > Ira > > -- > Ira Weiny > Math Programmer/Computer Scientist > Lawrence Livermore National Lab > 925-423-8008 > [email protected] > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
