> Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: multicast join failed for... > > On Wed, 2007-04-11 at 23:38, Michael S. Tsirkin wrote: > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > Subject: Re: multicast join failed for... > > > > > > On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: > > > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > > > Subject: Re: multicast join failed for... > > > > > > > > > > On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: > > > > > > > > If yes, I'm actually not too happy with this. > > > > > > > > > > > > > > > > Would something like the following heuristic work better? > > > > > > > > - select the max rate between all participants > > > > > > > > > > > > > > The issue is that one doesn't know all the participants in a > > > > > > > group as > > > > > > > they are joined dynamically. > > > > > > > > > > > > > > (I think we've been over this aspect on the list several times in > > > > > > > the > > > > > > > past.) > > > > > > > > > > > > That's why I suggest the fix, so that the rate is adapted > > > > > > dynamically. > > > > > > > > > > > > > > - when a host with lower rate joins, destroy the group > > > > > > > > > > > > > > I don't think a group can be destroyed like this "underneath" its > > > > > > > existing members. > > > > > > > > > > > > > > > > > > > Of course it can. That's what happens when SM is restarted. > > > > > > > > > > Client reregistration ? I don't like using that big hammer as a > > > > > solution > > > > > to this. Seems a little harsh to me. > > > > > > > > I think it's not too bad > > > > > > It requires all subscriptions to reregister. This affects more things > > > than just multicast or even the groups affected which might not be all > > > of the multicast groups. Hence BIG hammer. > > > > Changing an option in opensm config requires restarting > > opensm. Isn't that right? > > Yes but that doesn't have to be the case going forward in terms of > OpenSM reconfig. > > > > So its an even bigger hammer. > > Restarting opensm is a slightly bigger hammer right now (than client > reregistration) in the case the admin wants it "dynamic" but I suspect > this only needs to be done once.
I think you forgot that currently one has to edit the config file, just restarting opensm isn't enough :). Let the user decide for us is a *HUGE* hammer - it usually solves all problem, but at what cost? > > > There could be a more > > > graceful way to deal with this. I don't like using client reregister > > > unless absolutely needed. > > > > What are the other options that have the same funcitionality? > > Perhaps a spec enhancement is possible to make this better. Sure. Meanwhile, opensm will have to support legacy networks too so I think we can start with the reregister solution. > > > > - previously we had some client failing join > > > > which is worse. > > > > > > Maybe not. Maybe that's what the admin wants (to keep the higher rate > > > rather than degrade the group due to some link issue). > > > > Rate could be an option, but I think generally people prefer > > things working even if at a slower rate. > > I think it's a coin flip. I disagree. I think people that want the join to fail basically just want to make debugging easy. We can help them without failing joins. > I've seen it both ways and either way there > are support questions. I think we can solve this relatively easily: compare the bcast group rate with local rate and have IPoIB produce a warning in log if these do not match. This is similiar to what we have with USB2.0 device in USB slot, people seem to be happy. > In the current scenario, it is join failures. In > the proposed scenario, it is more subtle: performance implications and > perhaps SA network storms. I don't believe we'll see network storms: rate has to drop from DDR to SDR only once. -- MST _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
