On Fri, 2007-04-13 at 00:17, Michael S. Tsirkin wrote: > > Quoting Ira Weiny <[EMAIL PROTECTED]>: > > Subject: Re: [ofa-general] Re: multicast join failed for... > > > > On Thu, 12 Apr 2007 20:16:32 +0300 > > "Michael S. Tsirkin" <[EMAIL PROTECTED]> wrote: > > > > > > Quoting Ira Weiny <[EMAIL PROTECTED]>: > > > > Subject: Re: [ofa-general] Re: multicast join failed for... > > > > > > > > On Thu, 12 Apr 2007 07:21:55 +0300 > > > > "Michael S. Tsirkin" <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Quoting Ira Weiny <[EMAIL PROTECTED]>: > > > > > > Subject: Re: [ofa-general] Re: multicast join failed for... > > > > > > > > > > > > On 11 Apr 2007 17:45:54 -0400 > > > > > > Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > On Wed, 2007-04-11 at 15:47, Michael S. Tsirkin wrote: > > > > > > > > > > > > > > > - previously we had some client failing join > > > > > > > > which is worse. > > > > > > > > > > > > > > Maybe not. Maybe that's what the admin wants (to keep the higher > > > > > > > rate > > > > > > > rather than degrade the group due to some link issue). > > > > > > > > > > > > > > > > > > > Indeed, on a big cluster it would be better to have a few nodes > > > > > > dropped out > > > > > > than to limit the speed of the entire cluster. > > > > > > > > > > Why are you joining these nodes then? > > > > > Anyway, could always be an option. > > > > > > > > > > > > > We have seen a specific example where a nodes 4X link comes up at 1X. > > > > > > I think that the way to do it, is to make it possible to force endnode > > > link to > > > a specific rate. You can already do this with a simple script > > > from userspace, by testing the link rate once it comes up, > > > and downing the link if it's lower than what you want. > > > > > > If you think it's important, it's also quite trivial to > > > make it possible to disable 1x support through sysfs interface. > > > This way, the link will come up as 4x or not come up at all. > > > Would that be useful? > > > > Yes it would be useful. > > OK, I'll work on a patch for OFED 1.2. > > > Is this something I can do right now with OFED 1.1? > > With OFED 1.1 (without patches) you can do what I wrote above: > write a script that tests link width. > Disable ipoib, or the device, if it is 1x: > > For example > > #/usr/bin/bash > until > grep ACTIVE /sys/class/infiniband/mthca0/ports/*/state; > do > true; > done > > > if `grep 1x /sys/class/infiniband/mthca0/ports/1/rate` > then > rmmod ib_mthca > fi > > > > > > > > > > > In this > > > > case we would want the join to fail. Basically a single hardware error, > > > > isolated to 1 node, should not affect the other 1150 nodes, > > > > > > As far as I know, there are *a lot* of reasons where a problem at > > > 1 node will affect others on the same subnet. Do I have to give examples? > > > I don't see why do we have to choose a specific instance (incorrect > > > link rate at endnode) and handle it differently. > > > > > > > which could very well be running a users job. > > > > > > The job will continue running though, and when you diagnose the problem > > > and disconnect the bad node, rate will be back to high. > > > So what's the problem? > > > > Performance impact between the time it happens and diagnosing the problem. > > Yes, disabling the node is a better solution, however, the current behavior > > is > > not bad for us. > > Hal, here we have a use case that I think shows that the right thing > is by default to make joins succeed. Convinced?
Didn't Ira say that "the current behavior is not bad for us" ? The current behavior is default 4x SDR rate which makes slower joins fail. Are you saying change the default rate to 1x SDR ? I've been concerned about masking performance issues when doing this as we've discussed several times before. -- Hal > > > > > > > > > > > Certainly if there is a heterogeneous network we would want different > > > > behavior > > > > but we don't operate any of our clusters like that. After reading > > > > todays posts > > > > I think it should be an option. > > > > > > Yes. I think the option belongs at the endnodes, as outlined above. > > > > Yes that would be a good solution as well. > > > > > > > > > If someone has a mixture they can configure > > > > it. I am not sure what the default should be though. I know we would > > > > want > > > > the join to fail, but I understand the argument to allow it to work. > > > > > > This likely means that you have a sideband interconnect infrastructure > > > beside IPoIB. Otherwise, if the join fails, you don't even have a > > > way to debug the problem. > > > > > > > Yes we do have this. Like I said I could see where this would be > > beneficial to > > some users. > _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
