Hi Eitan, On Wed, 2007-07-11 at 06:51, Eitan Zahavi wrote: > Hi Ira, > > > Second, I have run some tests querying the fabric of our > > large clusters here (~500 nodes) and the results were > > promising for a single node implementation. > > I don't recall the numbers as this was a while ago but it was > > on the order of > > <2 sec and I think <1 but I don't want to be misquoted. > > Does PerfMgr query switch ports ?
Yes (of course it does). > If it does I am surprised by the short sweep time you got. > > Does it have >1 query on the wire at a given time? Yes, Default appears to be 500 currently (maybe that needs dialing back a bit) but is settable via perfmgr_max_outstanding_queries in options file. > If not then I am even more surprised. > > Was the cluster running a job at the time of the query ? Is this question related to VL0 contention ? -- Hal > Thanks > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > -----Original Message----- > > From: Ira Weiny [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, July 10, 2007 7:47 PM > > To: Eitan Zahavi > > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > > [email protected]; [EMAIL PROTECTED] > > Subject: Re: [ofa-general] IB performance stats (revisited) > > > > On Thu, 28 Jun 2007 10:24:59 +0300 > > "Eitan Zahavi" <[EMAIL PROTECTED]> wrote: > > > > > > On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote: > > > > > In the last months it is the second time I hear people > > > > complaining the > > > > > current monitoring solution in OFA is integrated with OpenSM. > > > > > > > > I must have missed this both times (didn't see this in Mark's > > > > post) and the statement itself is somewhat inaccurate as well. > > > Private talks - I hope they will speak up for themselves now... > > > > > > > > > These people do not use OpenSM but do use OFED. > > > > > > > > I'm not sure I'm following what you mean here. > > > > > > > > If you mean that some people want to run PerfMgr without > > the SM/SA > > > > aspects (so that they can run a vendor based SM), that is > > the next > > > > thing we are adding to the implementation. > > > Exactly. OK when is that coming? > > > > There is very little which ties the current PerfMgr to > > OpenSM. Basically it just gets the current fabric topology. > > As Hal has said changes are coming. > > > > > > > > > > > > > > Another drawback if that > > > > > no naming is provided and the reporting uses GUIDs. > > > > > > > > Naming is provided via NodeDescription. > > > This might be good for hosts but is not covering switches ... > > > > It does include switches. However, since most systems have > > the same name for multiple switches this becomes ineffective. > > I have queried Voltaire for a way to change the > > NodeDescription for switches, but at the time I asked, there > > was no way to do it. Perhaps there is now? What about other > > vendors? This is why ibnetdiscover and other diags have > > "switch map" support. (A GUID->name mapping to override the > > default NodeDescription.) Nothing would please me more than > > to be able to remove that for a more "automatic" solution. > > > > > > > > > > > I also can't hold myself from saying again I think you > > are going > > > > > to hit the wall with the concept of doing the PMA from > > a single node. > > > > > > > > If you are referring to the fact the PerMgr is currently not > > > > distributed, that will be done as has been stated before. > > > Good. When is it expected? Will it be OFED 1.3? > > > > When Hal first sent out the PerfMgr design I thought we > > should jump right to the distributed model as well. But now > > I am glad we have gone the way we did. > > First off, we have something which "works" and from which we > > can expand. > > Second, I have run some tests querying the fabric of our > > large clusters here (~500 nodes) and the results were > > promising for a single node implementation. > > I don't recall the numbers as this was a while ago but it was > > on the order of > > <2 sec and I think <1 but I don't want to be misquoted. > > > > For sure, a distributed model offers many advantages and we > > will get there. But for many the current single node > > approach should work just fine. > > > > Thanks, > > Ira > > > > > > > > Thanks > > > > > > > > -- Hal > > > > > > > > > Eitan Zahavi > > > > > Senior Engineering Director, Software Architect Mellanox > > > > Technologies > > > > > LTD > > > > > Tel:+972-4-9097208 > > > > > Fax:+972-4-9593245 > > > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: [EMAIL PROTECTED] > > > > > > [mailto:[EMAIL PROTECTED] On > > Behalf Of Hal > > > > > > Rosenstock > > > > > > Sent: Wednesday, June 27, 2007 8:12 PM > > > > > > To: Mark Seger > > > > > > Cc: Finn, Ed; [email protected] > > > > > > Subject: Re: [ofa-general] IB performance stats (revisited) > > > > > > > > > > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > > > > > > >The performance managers deal with the counter > > stickiness (by > > > > > > > >resetting them when they think they need to). They > > > > > > typically export > > > > > > > >their data although this is not specified by IBA so it is > > > > > > in a vendor > > > > > > > >proprietary manner. > > > > > > > > > > > > > > > > > > > > > > > so I guess these guys are poor citizens as well... > > > > > > > > > > > > Not sure what you mean. > > > > > > > > > > > > > the real issue as I see it then means nobody can trust > > > > the data if > > > > > > > randon tools randomly reset the counters. a real shame... > > > > > > > > > > > > I consider this to be a real rather than random app for this. > > > > > > Guess it depends on what one considers random. > > > > > > > > > > > > -- Hal > > > > > > > > > > > > > -mark > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > general mailing list > > > > > > [email protected] > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > > > > > To unsubscribe, please visit > > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > _______________________________________________ > > > general mailing list > > > [email protected] > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
