On Wed, 2007-04-11 at 23:38, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multicast join failed for... > > > > On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: > > > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > > > Subject: Re: multicast join failed for... > > > > > > > > On Wed, 2007-04-11 at 14:12, Michael S. Tsirkin wrote: > > > > > > > If yes, I'm actually not too happy with this. > > > > > > > > > > > > > > Would something like the following heuristic work better? > > > > > > > - select the max rate between all participants > > > > > > > > > > > > The issue is that one doesn't know all the participants in a group > > > > > > as > > > > > > they are joined dynamically. > > > > > > > > > > > > (I think we've been over this aspect on the list several times in > > > > > > the > > > > > > past.) > > > > > > > > > > That's why I suggest the fix, so that the rate is adapted > > > > > dynamically. > > > > > > > > > > > > - when a host with lower rate joins, destroy the group > > > > > > > > > > > > I don't think a group can be destroyed like this "underneath" its > > > > > > existing members. > > > > > > > > > > > > > > > > Of course it can. That's what happens when SM is restarted. > > > > > > > > Client reregistration ? I don't like using that big hammer as a solution > > > > to this. Seems a little harsh to me. > > > > > > I think it's not too bad > > > > It requires all subscriptions to reregister. This affects more things > > than just multicast or even the groups affected which might not be all > > of the multicast groups. Hence BIG hammer. > > Changing an option in opensm config requires restarting > opensm. Isn't that right?
Yes but that doesn't have to be the case going forward in terms of OpenSM reconfig. > So its an even bigger hammer. Restarting opensm is a slightly bigger hammer right now (than client reregistration) in the case the admin wants it "dynamic" but I suspect this only needs to be done once. > > There could be a more > > graceful way to deal with this. I don't like using client reregister > > unless absolutely needed. > > What are the other options that have the same funcitionality? Perhaps a spec enhancement is possible to make this better. > > > - previously we had some client failing join > > > which is worse. > > > > Maybe not. Maybe that's what the admin wants (to keep the higher rate > > rather than degrade the group due to some link issue). > > Rate could be an option, but I think generally people prefer > things working even if at a slower rate. I think it's a coin flip. I've seen it both ways and either way there are support questions. In the current scenario, it is join failures. In the proposed scenario, it is more subtle: performance implications and perhaps SA network storms. > Wat does opensm do now? > I think it uses the max possible rate when group is created. > Is that so? It uses the rate as configured in the partitions file or the default rate of 10 Gbs if not (for the per partition IPv4 broadcast group). > > > And we can still keep an option to limit the rate > > > manually. > > > > > > > I'm not convinced it's even > > > > required either, > > > > > > How do you mean? All end-points must know the rate is now lower. > > > > I didn't think we had the complete story yet on what is going on. > > You are speaking about a specific instance then? > OK, but I'm speaking generally, the issue comes up > quite often. OK, true and it is has been discussed on the list quite often. -- Hal _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
