> Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: multicast join failed for... > > On Thu, 2007-04-12 at 10:08, Michael S. Tsirkin wrote: > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > Subject: Re: multicast join failed for... > > > > > > On Wed, 2007-04-11 at 23:38, Michael S. Tsirkin wrote: > > > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > > > Subject: Re: multicast join failed for... > > > > > > > > > > On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: > > > > > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > > > > > Subject: Re: multicast join failed for... > > > > > > > > > > > > > > On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: > > > > > > > > > > If yes, I'm actually not too happy with this. > > > > > > > > > > > > > > > > > > > > Would something like the following heuristic work better? > > > > > > > > > > - select the max rate between all participants > > > > > > > > > > > > > > > > > > The issue is that one doesn't know all the participants in a > > > > > > > > > group as > > > > > > > > > they are joined dynamically. > > > > > > > > > > > > > > > > > > (I think we've been over this aspect on the list several > > > > > > > > > times in the > > > > > > > > > past.) > > > > > > > > > > > > > > > > That's why I suggest the fix, so that the rate is adapted > > > > > > > > dynamically. > > > > > > > > > > > > > > > > > > - when a host with lower rate joins, destroy the group > > > > > > > > > > > > > > > > > > I don't think a group can be destroyed like this "underneath" > > > > > > > > > its > > > > > > > > > existing members. > > > > > > > > > > > > > > > > > > > > > > > > > Of course it can. That's what happens when SM is restarted. > > > > > > > > > > > > > > Client reregistration ? I don't like using that big hammer as a > > > > > > > solution > > > > > > > to this. Seems a little harsh to me. > > > > > > > > > > > > I think it's not too bad > > > > > > > > > > It requires all subscriptions to reregister. This affects more things > > > > > than just multicast or even the groups affected which might not be all > > > > > of the multicast groups. Hence BIG hammer. > > > > > > > > Changing an option in opensm config requires restarting > > > > opensm. Isn't that right? > > > > > > Yes but that doesn't have to be the case going forward in terms of > > > OpenSM reconfig. > > > > > > > > So its an even bigger hammer. > > > > > > Restarting opensm is a slightly bigger hammer right now (than client > > > reregistration) in the case the admin wants it "dynamic" but I suspect > > > this only needs to be done once. > > > > I think you forgot that currently one has to edit the config file, > > just restarting opensm isn't enough :). > > Let the user decide for us is a *HUGE* hammer - it usually solves > > all problem, but at what cost? > > Doesn't the admin "plan" his network ? This is part of the installation > and bringup IMO.
I agree the admin must plan the network. But I disagree this should necessarily involve editing config files. > There are a couple of ways to avoid having the admin decide but they all > involve penalizing the more normal use cases (pushing the admin burden > to them). I'm ambivalent about whether that's a better choice. I don't think what I propose penalizes normal use. It just turns what used to be an error into working configuration. > > > > > There could be a more > > > > > graceful way to deal with this. I don't like using client reregister > > > > > unless absolutely needed. > > > > > > > > What are the other options that have the same funcitionality? > > > > > > Perhaps a spec enhancement is possible to make this better. > > > > Sure. Meanwhile, opensm will have to support legacy networks > > too so I think we can start with the reregister solution. > > OK; it could be another option. Would you propose this being the default > option ? No, I expect if node supports an ability to reregister specific mcast groups, this capability can be advertised somehow, and SM will use it if available, and plain reregister if not. > > > > > > - previously we had some client failing join > > > > > > which is worse. > > > > > > > > > > Maybe not. Maybe that's what the admin wants (to keep the higher rate > > > > > rather than degrade the group due to some link issue). > > > > > > > > Rate could be an option, but I think generally people prefer > > > > things working even if at a slower rate. > > > > > > I think it's a coin flip. > > > > I disagree. I think people that want the join to fail basically > > just want to make debugging easy. We can help them without failing joins. > > > > > I've seen it both ways and either way there > > > are support questions. > > > > I think we can solve this relatively easily: compare the bcast group > > rate with local rate and have IPoIB produce a warning in log if these > > do not match. > > > > This is similiar to what we have with USB2.0 device in USB slot, > > people seem to be happy. > > > > > In the current scenario, it is join failures. In > > > the proposed scenario, it is more subtle: performance implications and > > > perhaps SA network storms. > > > > I don't believe we'll see network storms: rate has to drop from DDR to SDR > > only once. > > Frequency appears low (but I'm sure we'll hit some oscillating case down > the road) but impacts all multicast groups whether or not this node > affects them as well as other subscriptions. Client reregister is a > storm IMO and should only be used when there is absolutely no other > choice. I agree it might be useful to give opensm a way to detect that a set of mcast groups belongs to a specific application, and a way to force re-registration. -- MST _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
