Hi Mark,

An additional comment following up some of the other ones assuming
you're using IPMI over LAN.

> In any event, if it turns out that sending/receiving the command to 
> the sensors is where all the is spent and startup is less of a big 
> deal, this whole discussion is moot.

We monitor, power control, do various things w/ IPMI on a number of
large clusters w/ IPMI over LAN.  I'm not sure how large the systems are
that collectl is trying to monitor, but I've found that there can be
slowdowns in IPMI over LAN with 1000+ nodes.  As a UDP protocol,
eventually one or two packets will get lost, so you can't monitor
consistently all the time.

ipmitool seems to indicate a default timeout of 2 seconds for over lan.
So if just one packet gets lost, you're hosed on your < 1 second goal.

Al

On Fri, 2009-01-16 at 07:48 -0500, Mark Seger wrote:
> Carol Hebert wrote:
> > Hi and Happy '09!
> >
> > To celebrate the New Year, I propose we start working on rolling
> > ipmitool to v1.8.11 to include all the great fixes and patches folks
> > have been sending in since last summer!  Do you have a fix you've been
> > using in your local ipmitool version?  Send it in for review!  Do you
> > have a patch or an idea for some new functionality or support you'd like
> > to see get into the tool?  Send it in for review! 
> Well you did ask, so here ya go...
> 
> I'm the author of collectl [see: http:// collectl.sourceforge.net/], 
> which is a tool for doing system monitoring.  A couple of the things I 
> think that sets it apart from others is it's very light-weight and that 
> it collects a very broad set of data, making it possible to drill down 
> and see what the state of all your system resources are any point in time.
> 
> About 6 months ago I discovered ipmitool and now use it to monitor 
> temps, fans and power though I could see adding other resources as 
> well.  My own problem with ipmitool is that while it's pretty 
> light-weight, it's not light-weight enough and so I had to give it its 
> own monitoring interval - I monitor most things every 10 seconds when 
> running as a daemon, though there are those who sample every 5 seconds 
> or even 1.  I did some tests with ipmitool running at those frequencies 
> and there was just too much system load compared to collectl proper 
> which uses <0.1% and so I'm only sampling sensors every 2 minutes, which 
> I thought at first was reasonable.
> 
> However since adding power sensor monitoring and watching the power 
> every second, just to see what's happening over short bursts of time, I 
> saw that sensor changes much more frequently than I thought it would.  
> Given all the interest in green computing I can see where more frequent 
> power monitoring would be a good thing.  I even went as far to see what 
> it might take to query ipmi directly to save the overhead of running a 
> new instance of ipmitool every sample period but I think I can now 
> appreciate just how complex ipmi is and would prefer to continue to use 
> ipmitool as it already does a great job.
> 
> So, as for an enhancement I'd like to see a way to run it at rates on 
> the order of a few seconds and not incur a lot of overhead.  I'm basing 
> this on my assumption that the bulk of ipmitool's time is spent on 
> starting the image as well as establishing the internal communications 
> connection(s).   If so, my logical conclusion would be there needs to be 
> a way to do the initialization one time and not exit.  While I can think 
> of a few ways to do it, and I think the first is probably the easiest, 
> perhaps someone intimately familiar with ipmitool's innards such as 
> yourself would have better ways:
> - run it as a daemon, allowing it to receive commands over a socket and 
> send the results back
> - build some sort of library that could be called via something like 
> perl, once call to initialize and a second to query, probably others...
> 
> In any event, if it turns out that sending/receiving the command to the 
> sensors is where all the is spent and startup is less of a big deal, 
> this whole discussion is moot.
> 
> Anyhow, you DID ask...  ;-)
> -mark
> 
> 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http:// p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Ipmitool-devel mailing list
> Ipmitool-devel@lists.sourceforge.net
> https:// lists.sourceforge.net/lists/listinfo/ipmitool-devel
> 
-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to