On Wed, 26 Aug 2009 10:55:41 -0400 Hal Rosenstock <[email protected]> wrote:
> On 8/25/09, Ira Weiny <[email protected]> wrote: > > > > On Tue, 25 Aug 2009 19:15:19 -0400 > > Hal Rosenstock <[email protected]> wrote: > > > > > On 8/24/09, Ira Weiny <[email protected]> wrote: > > > [snip] > > > > > > > > > Not all 4 combinations are supported/known to work. When this was added > > for > > > ibportstate, the only combined routing form that was important was LID > > > routed part followed by a DR part. > > > > > > > When you say "known to work" you mean implemented with the diags? Or known > > to > > work in all hardware? > > > The former with most hardware up to some time ago. Note there is no > compliance testing of combined routing and heavy reliance on this makes some > a little nervous. Ok, Good to know. With this, and the rest of your response, in mind I went ahead and created a patch to libibnetdisc which will go back to LID routing when the Hop Count is returned to 0. Patch to follow. > > > > > > > On the other hand I think strictly this should be supported. > > > > > > > > > In an ideal world yes but are they all required or is it just the one > > form > > > most heavily used ? > > > > That is what I am unclear on. Does the spec require that all 8 > > combinations > > are required to work? I don't see a specific compliance which says that > > and I > > am not sure if C14-9 and C14-13 cover all 8 combinations. > > > I don't think there's any compliance on this. It all appears to be > informative text. Perhaps a shortcoming of the spec. So there's nothing > definitive. It just says there are 8 combinations (2**3 as there are 3 parts > with 2 possibilities in each part) and that only 4 are really useful. Well I agree that only 4 are "useful". It is just the algorithm which libibnetdisc used which resulted in this "weird" case. [snip] > > > > > > If so, what's the initial path at this point (or more specifically index > > 1 > > > of the initial path) ? I think that needs to be port 0 (if a switch) but > > > this is a little weird as I would think it should be handed to the SMA > > which > > > is different cases in the spec. > > > > Yes I think I was wrong on the case. But still wouldn't the SMI detect > > that > > this is the end of the DRPath and simply hand it to the SMA. > > > Yes, that's what should happen. I am going to take this up with the switch vendors and see what their interpretation is. For the time being I think my patch will fix libibnetdisc (iblinkinfo). Thanks again! Ira > > > > > > > > > > Then after processing > > > > > > > > > by the SMA and doing the required returning initialization > > > > > > the SMI should return the packet as specified in C14-13 > > > > item 3 on line 9 page 812. > > > > > > > > > I'm not sure it would use this case in the case of an empty DR pafh on > > > return. > > > > Actually I think it will use this. C14-9 item 3) states "the Hop Pointer > > shall be incremented by 1" Therefore when the response is handed back to > > the > > SMI the Hop pointer will be 1 and the hop count 0. And the SMI uses the > > DRSLID to send the packet back to the requester. > > > It goes up to the SMA and then when the response is to be made it goes > through returning SMI initialization and handling. > > -- Hal > > > > > > Am I wrong? In the end it does not matter as I have to make the software > > > > work > > > > for all the hardware I have; so I will change the software. > > > > > > > > > IMO it does matter as to where the problem lies (SMI or otherwise) and > > how > > > the layers are comprised in the implementation. > > > > Agreed. I am mainly confused because I have 2 different implementations of > > this. My "old" switches seem to handle this case just fine. My "new" > > switches do not. So I am really wondering what is going on. > > > > Here is the above output for the same query which works with an "old" > > switch. > > > > 17:28:04 > ./smpquery -e -c portinfo 7 0 1 > > ... > > trid 1a4329de; HopCount 0; HopPointer 0; slid 2; dlid 65535; 0, drpath->cnt > > 0 > > ... > > > > Aug 25 17:46:40 woprjr0 Madeye:sent SMP > > Aug 25 17:46:40 woprjr0 MAD version....0x1 > > Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP) > > Aug 25 17:46:40 woprjr0 Class version..0x1 > > Aug 25 17:46:40 woprjr0 Method.........0x1 (Get) > > Aug 25 17:46:40 woprjr0 Status.........0x00 > > Aug 25 17:46:40 woprjr0 Hop pointer....0x0 > > Aug 25 17:46:40 woprjr0 Hop counter....0x0 > > Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de > > Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info) > > Aug 25 17:46:40 woprjr0 Attr modifier..0x0001 > > Aug 25 17:46:40 woprjr0 Mkey...........0x0 > > Aug 25 17:46:40 woprjr0 DR SLID........0x02 > > Aug 25 17:46:40 woprjr0 DR DLID........0xffff > > Aug 25 17:46:40 woprjr0 Madeye:recv SMP > > Aug 25 17:46:40 woprjr0 MAD version....0x1 > > Aug 25 17:46:40 woprjr0 Class..........0x81 (Directed route SMP) > > Aug 25 17:46:40 woprjr0 Class version..0x1 > > Aug 25 17:46:40 woprjr0 Method.........0x81 (Get response) > > Aug 25 17:46:40 woprjr0 Status.........0x8000 > > Aug 25 17:46:40 woprjr0 Hop pointer....0x0 > > Aug 25 17:46:40 woprjr0 Hop counter....0x0 > > Aug 25 17:46:40 woprjr0 Trans ID.......0x1ba01a4329de > > Aug 25 17:46:40 woprjr0 Attr ID........0x15 (port info) > > Aug 25 17:46:40 woprjr0 Attr modifier..0x0001 > > Aug 25 17:46:40 woprjr0 Mkey...........0x0 > > Aug 25 17:46:40 woprjr0 DR SLID........0x02 > > Aug 25 17:46:40 woprjr0 DR DLID........0xffff > > > > Hop Pointer and Count are both 0 and things work just fine... > > > > > > > > However, I wonder > > > > where exactly the spec falls on this, because I think it will influence > > > > where > > > > the fix resides. If the spec does not allow this then I think it is > > fine > > > > to > > > > have libibmad return an error since the user specified an invalid > > combined > > > > DR > > > > path. However, if this should be legal I think libibmad should work > > around > > > > the bad hardware out there. > > > > > > > > > Is it hardware or firmware that needs fixing ? I think it may depend on > > the > > > specific workaround for this as to whether it is acceptable as it might > > harm > > > something else or might violate the spec. > > > > I agree, however, if the switch hardware needs fixing I fear it is too late > > for the ones I have. Firmware might be upgradable although I have had > > issues > > with un-managed switches in the past. > > > > So where do we put the fix in software? > > > Ira > > > > > -- Hal > > > > > > > > > Thoughts? > > > > Ira > > > > > > > > -- > > > > Ira Weiny > > > > Math Programmer/Computer Scientist > > > > Lawrence Livermore National Lab > > > > 925-423-8008 > > > > [email protected] > > > > _______________________________________________ > > > > general mailing list > > > > [email protected] > > > > http://**lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > > http://**openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > [email protected] > > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
