If a node is having trouble when running this script it looks like gmetric
commands fail.

I see:
---
./ganglia_ipmi_sensors.pl -h mp-X[32-34] -r mp- -d -D
IPMI_HOSTS=mp-X[32-34]
IPMI_HOSTS_SUBST=mp-
IPMI_SENSORS_PATH=/usr/sbin/ipmi-sensors
IPMI_SENSORS_ARGS=
GMETRIC_PATH=/usr/bin/gmetric
GMETRIC_ARGS=
ipmi-sensors command: /usr/sbin/ipmi-sensors  -h mp-X[32-34] --quiet-cache
--sdr-cache-recreate --always-prefix --no-header-output
--output-sensor-state
mp-X33: /usr/sbin/ipmi-sensors: connection timeout
/usr/sbin/ipmi-sensors: failed
---

I see where the exit occurs checking for the return of running the
ipmi-sensors command.  It seems that we would still want ganglia plotting
for the "good" nodes and not exit.  Otherwise we have to make sure all the
nodes are "good" all the time.  And of course that happens sometimes, but
not all the time. :)

Here's the exit I commented out so we could continue to run.  Are there any
other reasons we'd want to exit?

--- ganglia_ipmi_sensors.pl
$IPMI_SENSORS_OUTPUT = `$cmd`;
if ($? != 0)
{
    print "$IPMI_SENSORS_PATH: failed\n";
#    exit(1);
}
---

Thanks,
-cdm

On Wed, Feb 9, 2011 at 5:22 PM, Albert Chu <[email protected]> wrote:

> Hey Chris,
>
> What's the --debug output say?
>
> Al
>
> On Wed, 2011-02-09 at 16:06 -0800, Christopher Maestas wrote:
> > It looks like the ganglia script runs:
> >
> >
> > /usr/sbin/ipmi-sensors -h mp-N[1-2],mp-C[1-120] --quiet-cache
> > --sdr-cache-recreate --always-prefix --no-header-output
> > --output-sensor-state
> >
> >
> > I tried adding -f and nothing returned.  Then I tried running the
> > command again and I see:
> >
> >
> > ipmi_sdr_cache_create: SDR record length invalid
> >
> >
> > again.
> >
> > On Wed, Feb 9, 2011 at 4:51 PM, Albert Chu <[email protected]> wrote:
> >         Is this independent of the script?  What if you run
> >         ipmimonitoring by
> >         itself?  The output strongly suggests that the SDR cache is
> >         corrupted.
> >         You could try flushing the cache (-f I think) and see if it
> >         helps when
> >         the cache is recreated.
> >
> >         Al
> >
> >
> >         On Wed, 2011-02-09 at 15:31 -0800, Christopher Maestas wrote:
> >         > FYI:
> >         >
> >         >
> >         > I seem to see this when running this script now:
> >         >
> >         >
> >         > ---
> >         > NODENAME: ipmi_sdr_cache_create: SDR record length invalid
> >         > ...
> >         >
> >         >
> >         > Here's how I'm running it:
> >         >
> >         >
> >         > /path/to/ganglia_ipmi_sensors.pl -h mp-N[1-2],mp-C[1-120] -r
> >         mp-
> >         >
> >         >
> >         > I know I've seen this problem before, but the solution
> >         escapes me.
> >         >
> >         >
> >         > Thanks,
> >         > -cdm
> >         >
> >         > On Mon, Feb 7, 2011 at 10:44 AM, Albert Chu <[email protected]>
> >         wrote:
> >         >         Hey Chris, Yaroslav,
> >         >
> >         >         Ok.  I'll go ahead and commit this under the
> >         assumption we
> >         >         want to go
> >         >         with it.
> >         >
> >         >         Al
> >         >
> >         >
> >         >         On Sat, 2011-02-05 at 07:33 -0800, Christopher
> >         Maestas wrote:
> >         >         > Sounds good ... I did some initial porting work to
> >         the 1.0
> >         >         beta2 and I
> >         >         > agree with you about passing any string expression
> >         to be
> >         >         > evaluated. :)  I'l try this out next week.
> >         >         >
> >         >         > On Fri, Feb 4, 2011 at 5:54 PM, Yaroslav Halchenko
> >         >         <[email protected]>
> >         >         > wrote:
> >         >         >
> >         >         >         On Fri, 04 Feb 2011, Albert Chu wrote:
> >         >         >         > Yaroslav, will it suit your needs too?
> >         >         >
> >         >         >         > Both patch & script are attached.
> >         >         >
> >         >         >
> >         >         >         thanks!  looks like it should be what was
> >         >         requested... I am
> >         >         >         still using
> >         >         >         ancient (from last year) pre-1.0 version
> >         (0.8.10),
> >         >         so have
> >         >         >         incompatible
> >         >         >         ipmi-sensors:
> >         >         >
> >         >         >         /usr/sbin/ipmi-sensors: unrecognized
> >         option
> >         >         >         '--output-sensor-state'
> >         >         >
> >         >         >         but otherwise the patch looks like it
> >         should work ;)
> >         >         >
> >         >         >         --
> >         >         >         Yaroslav O. Halchenko
> >         >         >         Postdoctoral Fellow,   Department of
> >         Psychological
> >         >         and Brain
> >         >         >         Sciences
> >         >         >         Dartmouth College, 419 Moore Hall, Hinman
> >         Box 6207,
> >         >         Hanover,
> >         >         >         NH 03755
> >         >         >         Phone: +1 (603) 646-9834
> >         Fax:
> >         >         +1 (603)
> >         >         >         646-1419
> >         >         >         WWW:   http://www.linkedin.com/in/yarik
> >         >         >
> >         >
> >         >         --
> >         >
> >         >         Albert Chu
> >         >         [email protected]
> >         >         Computer Scientist
> >         >         High Performance Systems Division
> >         >         Lawrence Livermore National Laboratory
> >         >
> >         >
> >         >
> >         >
> >
> >         --
> >
> >         Albert Chu
> >         [email protected]
> >         Computer Scientist
> >         High Performance Systems Division
> >         Lawrence Livermore National Laboratory
> >
> >
> >
> >
> --
> Albert Chu
> [email protected]
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
_______________________________________________
Freeipmi-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/freeipmi-devel

Reply via email to