On 18:41 Thu 20 Dec , Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: > > On 09:40 Wed 19 Dec , Yevgeny Kliteynik wrote: > >> Sasha Khapyorsky wrote: > >>> Hi Yevgeny, > >>> On 15:33 Mon 17 Dec , Yevgeny Kliteynik wrote: > >>>> If a heavy sweep requested during idle queue processing, OSM continues > >>>> to process it till the end and only then notices the heavy sweep > >>>> request. > >>>> In some cases this might leave a topology change unhandled for several > >>>> minutes. > >>> Could you provide more details about such cases? > >>> As far as I know the idle queue is used only for multicast re-routing. > >>> If so, it is interesting by itself why it takes minutes and where. Is > >>> where MCG join/leave storm? > >> Exactly. The problem was discovered on a big cluster with hundreds of > >> mcast groups, > >> when there is some massive change in the subnet (like rebooting hundreds > >> of nodes). > > Ok, then proposed patch looks like half solution for me. > > During mcast join/leave storm idle queue will be filled with requests to > > rebuild mcast routing. OpenSM will process it one by one (and this will > > take a lot of time) instead of process all pended mcast groups in one > > run. I think it is first improvement needed here. > > Even with such improvement we will not be able to control the order of > > heavy sweep/mcast join requests, so basically idea of breaking idle > > queue processing looks fine for me, but it is not all what should be > > done here. Heavy sweep by itself recalculates mcast routing for all > > existing groups, it should invalidate all pended mcast rerouting > > requests instead of continuing idle queue processing after heavy > > sweep. Make sense? > > OK, makes sense. > So bottom line, when breaking the idle queue processing because of immediate > sweep request, state manager should just purge the whole idle queue and then > start the new heavy sweep.
Yes, it is one patch, another expected patch for improving mcast join requests/node reboot storm handling by OpenSM is recalculating mcast routing for more than one mcast groups (actually I think requested mcast groups should be queued in the list and mcast re-routing request merged + some trivial processor function in osm_mcast_mgr.c). Maybe whole idle queue mechanism can be killed as useless, then this will impact heavy sweep related patch. Sasha _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
