Hey Sasha, Thoughts below.
On Sun, 2008-09-28 at 23:26 +0300, Sasha Khapyorsky wrote: > Hi Al, > > Sorry about delay. It took some time to review this patch series in > deep. > > On 12:20 Mon 15 Sep , Al Chu wrote: > > > > As we've discussed before, we wanted to put routing chaining into > > opensm. > > This is great. Thanks! > > > osm_ucast defaults to minhop - The current code automatically > > defaulted to minhop if anything in the selected routing engine failed. > > Naturally this had to be changed for routing chaining. I moved minhop > > out of the ucast_mgr code to make it its own routing engine instead. > > I fully agree that minhop should be implemented as regular routing > engine. However I don't think we need to move > osm_ucast_mgr_build_lid_matrices() and ucast_mgr_build_lfts() and make > it minhop routing engine specific. osm_ucast_mgr_build_lid_matrices() > implements generic lid matrix generation code and ucast_mgr_build_lfts() > has generic balancer, both are useful by other routing engines. > > > osm_ucast assumption on routing failures - The current code defaulted > > to minhop if anything in the selected routing engine failed. Because > > of this some routing engines (most notably "file" routing) > > intentionally "failed" when it wanted default to some portion of > > minhop behavior. All routing behavior had to be moved into routing > > engines to have the routing engines fully fail/succeed on their own. > > Maybe we can use simpler solution - to use method's return status. It > can return negative value on failure, zero on success and positive if > method fallback is requested. This is fine as well. I had considered doing it this way as well, but I guess it comes down to personal programming style. > This will help to keep routing engine method to be optional (potentially > we can add multicast routing methods there) and to keep this simple > enough when single method fallback is desired. > > > updn routing - currently utilizes the minhop build_fwd_tables but > > minhop's code assumes if build_lid_matrices is not-null, it is in > > "up/dn routing mode" instead of "minhop mode". Perfectly fine when > > you can specify max of one routing engine, but needs to be abstracted > > out of minhop so up/dn is independent in its routing "attempt" in the > > chain. > > Agree. But actually usage of this check is odd IMHO. I think it is fine > to just remove this and make unconditional debug warning instead. > > > minhop routing assumed to never fail - Currently minhop routing cannot > > "fail". So if someone wanted to put minhop into the middle of a > > routing chain, it makes no sense. I assume this was based on legacy, > > when the minhop algorithm did not have options like > > "guid_routing_order_file" that could be parsed incorrectly. > > Sure it is legacy, but not for only this reason. > > In some already old days OpenSM state machine was design so that > "managers" (such as ucast_mgr, lid_mgr, etc.) were needed to return > value (osm_signal_t) which indicates sending MADs instead of its actual > execution status. > > Fortunately it is not the case anymore, so we can rework all managers > (including ucast_mgr) to return its real "status". > > > So, lots of rearchitecture were done and lots of cleanup was done as > > well. Some bug fixes along the way too. > > Finally this patch series leaves as with us: > > 21 files changed, 1538 insertions(+), 698 deletions(-) > > , which I think is pretty big for routing chaining :(. > > Assuming that we want this important feature to be included in OFED-1.4 Although it'd be nice to get into OFED 1.4, I know that we (the lab) aren't in too much of a hurry to see it in OFED 1.4. Of course, we backport new stuff into our local tree whenever we want. That's not most people. Al > and that OFED release cycle is already in "RC" phase I reworked this > as single and smaller patch which is based on your original patch series > (so obviously authorship is preserved). It includes some thoughts > mentioned above and also works with ibsim (still test it although). I > will post it to the list shortly. Let me know how it looks? > > Ideally it would be nice to have it integrated before the next RC release > (6 Oct). > > Sasha -- Albert Chu [EMAIL PROTECTED] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
