Al Chu wrote:
Hey Yevgeny,

This looks like a great idea.  But is there a reason its only supported
for LMC=0?  Since the caching is handled at the ucast-mgr level (rather
than in the routing algorithm code), I don't quite see why LMC=0
matters.

No particular reason - I'll enhance it for LMC>0, just didn't find the time
to do it right now. The cached topology model is based on LIDs, so I just
need to check that LMC>0 doesn't break anything.
I also had a more complex topology and routing model, where I wasn't relying
on LIDs - I had what I called "Virtual LIDs", and at every heavy sweep
the topology model was built and Virtual LIDs were matched to LIDs to create
VLID <-> LID mapping, so that the cache won't depend on fabric LIDs, and
there I had some problems with LMC (can't remember what exactly), but that
model proved to be useless.

Maybe it is b/c of future incremental routing on your todo?  If that's
the case, instead of only caching when LMC=0, perhaps initial
incremental routing should only work under LMC=0.  Later on incremental
routing for LMC > 0 could be added.

Agree, that is what I eventually should do.

-- Yevgeny

Al

On Sun, 2008-05-04 at 13:08 +0300, Yevgeny Kliteynik wrote:
One thing I need to add here: ucast cache is currently supported
for LMC=0 only.

-- Yevgeny

Yevgeny Kliteynik wrote:
Hi Sasha,

The following series of 4 patches implements unicast routing cache
in OpenSM.

None of the current routing engines is scalable when we're talking
about big clusters. On ~5K cluster with ~1.3K switches, it takes
about two minutes to calculate the routing. The problem is, each
time the routing is calculated from scratch.

Incremental routing (which is on my to-do list) aims to address this
problem when there is some "local" change in fabric (e.g. single
switch failure, single link failure, link added, etc).
In such cases we can use the routing that was already calculated in
the previous heavy sweep, and then we just have to modify it according
to the change.

For instance, if some switch has disappeared from the fabric, we can
use the routing that existed with this switch, take a step back from
this switch and see if it is possible to route all the lids that were
routed through this switch some other way (which is usually the case).

To implement incremental routing, we need to create some kind of unicast
routing cache, which is what these patches implement. In addition to being
a step toward the incremental routing, routing cache is usefull by itself.

This cache can save us routing calculation in case of change in the leaf
switches or in hosts. For instance, if some node is rebooted, OpenSM would
start a heavy sweep with full routing recalculation when the HCA is going
down, and another one when HCA is brought up, when in fact both of these
routing calculation can be replaced by using of unicast routing cache.

Unicast routing cache comprises the following:
 - Topology: a data structure with all the switches and CAs of the fabric
 - LFTs: each switch has an LFT cached
 - Lid matrices: each switch has lid matrices cached, which is needed for
   multicast routing (which is not cached).

There is a topology matching function that compares the current topology
with the cached one to find out whether the cache is usable (valid) or not.

The cache is used the following way:
 - SM is executed - it starts first routing calculation
 - calculated routing is stored in the cache
 - at some point new heavy sweep is triggered
 - unicast manager checks whether the cache can be used instead
   of new routing calculation.
   In one of the following cases we can use cached routing
    + there is no topology change
    + one or more CAs disappeared (they exist in the cached topology
      model, but missing in the newly discovered fabric)
    + one or more leaf switches disappeared
   In these cases cached routing is written to the switches as is
   (unless the switch doesn't exist).
   If there is any other topology change:
     - existing cache is invalidated
     - topology is cached
     - routing is calculated as usual
     - routing is cached

My simulations show that when the usual routing phase of the heavy
sweep on the topology that I mentioned above takes ~2 minutes,
cached routing reduces this time to 6 seconds (which is nice, if you
ask me...).

Of all the cases when the cache is valid, the most painful and
"complainable" case is when a compute node reboot (which happens pretty
often) causes two heavy sweeps with two full routing calculations.
Unicast Routing Cache is aimed to solve this problem (again, in addition
to being a step toward the incremental routing).

-- Yevgeny
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to