Hey Sasha, As we've discussed before, we wanted to put routing chaining into opensm. Here is a patch series to support it.
For others on the list, routing chaining is the ability to configure the order in which routing algorithms are applied in opensm, i.e. -R ftree,updn,minhop Try using ftree routing. If ftree fails, try updn. If updn fails, try minhop. In order to get this done, some rearchitecture of the routing code had to be done b/c there is no longer an assumption that only one routing engine can be specified. Here's a summary of the overall rearchitecture. osm_ucast defaults to minhop - The current code automatically defaulted to minhop if anything in the selected routing engine failed. Naturally this had to be changed for routing chaining. I moved minhop out of the ucast_mgr code to make it its own routing engine instead. osm_ucast assumption on routing failures - The current code defaulted to minhop if anything in the selected routing engine failed. Because of this some routing engines (most notably "file" routing) intentionally "failed" when it wanted default to some portion of minhop behavior. All routing behavior had to be moved into routing engines to have the routing engines fully fail/succeed on their own. updn routing - currently utilizes the minhop build_fwd_tables but minhop's code assumes if build_lid_matrices is not-null, it is in "up/dn routing mode" instead of "minhop mode". Perfectly fine when you can specify max of one routing engine, but needs to be abstracted out of minhop so up/dn is independent in its routing "attempt" in the chain. dor routing "dependency" on ucast_mgr - the is_dor flag was checked/determined inside the ucast_mgr. Dor routing had to be "split out" of the ucast manager so its routing engine is independent of another routing engine's "attempt" in the chain. minhop routing assumed to never fail - Currently minhop routing cannot "fail". So if someone wanted to put minhop into the middle of a routing chain, it makes no sense. I assume this was based on legacy, when the minhop algorithm did not have options like "guid_routing_order_file" that could be parsed incorrectly. So I made changes to allow minhop to have options passed to it that allow it to "fail" or "move on no matter what". Subsequently, if all routing chaining inputs from the user fail, a bare bones "move on no matter what" minhop is executed. If no routing algorithm is specified, we still use minhop by default. So, lots of rearchitecture were done and lots of cleanup was done as well. Some bug fixes along the way too. Naturally, there may be some style differences and some code-efficiencies I just don't see right now. I may have missed something in the routing rearchitecture in part 2. But at the core, it seems to work :-) I've currently only tested against ibism, not a real cluster. Hope to do that later on. Please let me know what you think. Al -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
