Package: munin-node
Version: 1.4.5-3
Severity: normal
Tags: patch
hddtemp_smartctl has numerous bugs and annoymances e.g.:
. Fails to collect anything on systems with large numbers of drives due
to timeouts.
. Fails to collect anything on systems with one or more faulty or slow
drives due to timeouts.
. Fails to collect anything on busy (e.g. I/O) systems due to timeouts
. Arbitrarily sets graph minimum to zero celsius - I have three drives
here which say:
Min/Max recommended Temperature: -4/72 Celsius
Min/Max Temperature Limit: -9/77 Celsius
and five which say:
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -40/70 Celsius
. Doesn't query temperature limits from drives
. On Linux systems only the first 26 drives are checked (e.g. not /dev/sdaa,
/dev/sdab etc.).
. Fails to detect smartctl '--nocheck' option on Squeeze+
I think this patch fixes those bugs sufficiently to get it working on my
system.
Cheers,
Tim.
--- /usr/share/munin/plugins/hddtemp_smartctl 2010-10-05 14:39:07.000000000
+0100
+++ hddtemp_smartctlmt 2011-07-31 14:01:23.415941442 +0100
@@ -23,6 +23,13 @@
not return its temperature when only the -A option is
used.
dev_$dev - monitoring device for one drive, e.g. twe0
+ maxchild - Maximum number of smartctl child processes to run at any
+ one time - defaults to 4
+ minstartdelaysecs - Time in seconds to wait between spawning child
+ processes, defaults to 0.1
+ smartctltimeoutsecs - Maximum time to wait for a child process to run
+ in case of a broken drive etc. Defaults to 5
+
If the "smartctl" enviroment variable is not set the plugin will
search your $PATH, /usr/bin, /usr/sbin, /usr/local/bin and
@@ -89,11 +96,29 @@
smartctl then hdparm will be used. Note that hdparm isn't available
on all platforms.
+=head1 BUGS
+
+ * Uses part of the device name as the munin data source fieldname,
+ which can and does frequently change. Should be derived from the
+ drive model+serial number instead.
+ * All modern drives make their recommended/absolute min/max
+ rated temperatures available - see 'smartctl -l scttemp' this
+ info should be provided by this plugin when run with the "config"
+ argument.
+
=cut
use strict;
+use Errno;
+use POSIX;
+use IO::File;
+sub pipe_from_fork ($);
my $DEBUG = $ENV{'MUNIN_DEBUG'} || 0;
+my $maxchild = $ENV{'maxchild'} || 4;
+my $minstartdelaysecs = $ENV{'minstartdelaysecs'} || 0.1;
+my $smartctltimeoutsecs = $ENV{'timeout'} || 5;
+
my $smartctl = exists $ENV{smartctl} ? $ENV{smartctl} : '';
@@ -116,8 +141,8 @@
# Check version of smartctl to determine --nocheck capabilities
my $use_nocheck = 0;
-if ($smartctl and `$smartctl --version` =~ / version (\d+\.\d+) /i) {
- $use_nocheck = $1 >= 5.37;
+if ($smartctl and `$smartctl --help` =~ /nocheck/) {
+ $use_nocheck = 1;
warn "[DEBUG] Smartctl supports --nocheck\n" if $DEBUG;
}
@@ -141,7 +166,7 @@
my @drivesSCSI;
if (-d '/sys/block/') {
opendir(SCSI, '/sys/block/');
- @drivesSCSI = grep /sd[a-z]/, readdir SCSI;
+ @drivesSCSI = grep /sd[a-z]+/, readdir SCSI;
closedir(SCSI);
}
@@ -185,7 +210,6 @@
}
} elsif ($ARGV[0] eq 'config') {
print "graph_title HDD temperature\n";
- print "graph_args --base 1000 -l 0\n";
print "graph_vlabel temp in °C\n";
print "graph_category sensors\n";
print "graph_info This graph shows the temperature in degrees Celsius of
the hard drives in the machine.\n";
@@ -194,7 +218,14 @@
}
}
-foreach my $drive (@drives) {
+my %smartreaders;
+my @remainingdrives = @drives;
+
+while (@remainingdrives || keys %smartreaders) {
+ # Fire-up another smartctl?
+ if (@remainingdrives && ((keys %smartreaders) < $maxchild)) {
+
+ my $drive = shift @remainingdrives;
warn "[DEBUG] Processing $drive\n" if $DEBUG;
my $fulldev = device_for_drive($drive);
@@ -209,29 +240,88 @@
}
my $cmd = command_for_drive_device($drive, $fulldev, $use_nocheck);
- warn "[DEBUG] Command for $drive is % $cmd %\n" if $DEBUG;
+ $smartreaders{$drive}{'cmd'} = $cmd;
- my $output = `$cmd`;
- if ($? ne 0) {
- print "$drive.value U\n";
- print "$drive.extinfo Command $cmd on drive $drive failed: $?. The
plugin needs to have read permission on all monitored devices.\n";
- warn "[ERROR] Command $cmd on drive $drive failed: $?. The plugin needs
to have read permission on all monitored devices.\n";
- next;
+ # This did use perl's "open |-" syntax, but this didn't work in 5.8.0 :-(
+ $smartreaders{$drive}{'fh'} = IO::File->new;
+ $smartreaders{$drive}{'pid'} = pipe_from_fork($smartreaders{$drive}{'fh'});
+
+ if ($smartreaders{$drive}{'pid'}) { # parent process
+ # switch to non-blocking reading of the pipe
+ use Fcntl;
+ my $flags = 0;
+ fcntl($smartreaders{$drive}{'fh'}, F_GETFL, $flags) or die "Couldn't get
flags for $smartreaders{$drive}{'fh'} : $!\n";
+ $flags |= O_NONBLOCK;
+ fcntl($smartreaders{$drive}{'fh'}, F_SETFL, $flags) or die "Couldn't set
flags for $smartreaders{$drive}{'fh'}: $!\n";
+ $smartreaders{$drive}{'output'} = '';
+ } else { # child process - read with timeout.
+ close STDIN;
+ close STDERR;
+ eval {
+ local $SIG{ALRM} = sub { die "TIMEOUT: $cmd" };
+ alarm $smartctltimeoutsecs;
+ #print "Current Drive Temperature: 42\n";
+ print `$cmd`;
+ };
+ if ($@ and $@ =~ /^TIMEOUT/) { print "TIMEOUT: $cmd"; exit 1 }
+ exit 0;
+ }
+
+ }
+ # Collect any waiting data from the smartctl child processes
+ foreach my $driveworker (keys %smartreaders) {
+ my $rv = sysread($smartreaders{$driveworker}{'fh'}, my $cmdoutput, 1024);
+ if (!defined($rv)) {
+ if ($! != Errno::EAGAIN) {
+ warn "[ERROR] Failed to read from child process for $driveworker: $?";
+ delete $smartreaders{$driveworker};
+ }
+ } else {
+ if ($rv > 0) {
+ $smartreaders{$driveworker}{'output'} .= $cmdoutput;
+ } else { # EOF on pipe...
+ close $smartreaders{$driveworker}{'fh'};
+ POSIX::waitpid($smartreaders{$driveworker}{'pid'}, 0);
+ my $rc = $?;
+ if (WIFSIGNALED($rc)) {
+ warn "Child process existed with signal\n";
}
- if ($output =~ /Current Drive Temperature:\s*(\d+)/) {
- print "$drive.value $1\n";
- } elsif ($output =~ /^(194 Temperature_Celsius.*)/m) {
+ if (WIFEXITED($rc)) {
+ #print "exit status: " . WEXITSTATUS($rc) . "\n";
+ } else {
+ print "wifexited returned false - exit status: " . WEXITSTATUS($rc)
. "\n";
+ }
+ if (WIFEXITED($rc) && (WEXITSTATUS($rc) != 0)) {
+ my $errmsg;
+ print "$driveworker.value U\n";
+ warn $smartreaders{$driveworker}{'output'} if $DEBUG;
+ if ($smartreaders{$driveworker}{'output'} =~ /^TIMEOUT/) {
+ $errmsg = "$driveworker timed out after $smartctltimeoutsecs
seconds: '" . $smartreaders{$driveworker}{'cmd'} . "'\n";
+ } else {
+ $errmsg = "'" . $smartreaders{$driveworker}{'cmd'} . "' on drive
$driveworker failed with exit status: " . WEXITSTATUS($?) . ". The plugin
needs to have read permission on all monitored devices.\n";
+ }
+ print "$driveworker.extinfo $errmsg";
+ warn "[ERROR] $errmsg";
+ } else {
+ if ($smartreaders{$driveworker}{'output'} =~ /Current Drive
Temperature:\s*(\d+)/) {
+ print "$driveworker.value $1\n";
+ } elsif ($smartreaders{$driveworker}{'output'} =~ /^(194
Temperature_Celsius.*)/m) {
my @F = split /\s+/, $1;
- print "$drive.value $F[9]\n";
- } elsif ($output =~ /^(231 Temperature_Celsius.*)/m) {
+ print "$driveworker.value $F[9]\n";
+ } elsif ($smartreaders{$driveworker}{'output'} =~ /^(231
Temperature_Celsius.*)/m) {
my @F = split ' ', $1;
- print "$drive.value $F[9]\n";
+ print "$driveworker.value $F[9]\n";
} else {
- print "$drive.value U\n";
- print "$drive.extinfo Temperature not detected in smartctl output\n";
+ print "$driveworker.value U\n";
+ print "$driveworker.extinfo Temperature not detected in smartctl
output\n";
}
}
-
+ delete $smartreaders{$driveworker};
+ }
+ }
+ }
+ sleep $minstartdelaysecs;
+}
sub device_for_drive {
my ($drive) = @_;
@@ -256,3 +346,19 @@
}
+# simulate open(FOO, "-|"), but in a way that actually gives you access to the
+# exit status of the process on the end of the pipe - from "perldoc perlfork"
+sub pipe_from_fork ($) {
+ my $parent = shift;
+ pipe $parent, my $child or die;
+ my $pid = fork();
+ die "fork() failed: $!" unless defined $pid;
+ if ($pid) {
+ close $child;
+ }
+ else {
+ close $parent;
+ open(STDOUT, ">&=" . fileno($child)) or die;
+ }
+ $pid;
+}
-- System Information:
Debian Release: 6.0.2
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-openvz-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages munin-node depends on:
ii adduser 3.112+nmu2 add and remove users and groups
ii gawk 1:3.1.7.dfsg-5 GNU awk, a pattern scanning and pr
ii libnet-server-perl 0.97-1 An extensible, general perl server
ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii munin-common 1.4.5-3 network-wide graphing framework (c
ii perl 5.10.1-17squeeze2 Larry Wall's Practical Extraction
ii procps 1:3.2.8-9 /proc file system utilities
Versions of packages munin-node recommends:
ii libnet-snmp-perl 5.2.0-4 Script SNMP connections
Versions of packages munin-node suggests:
ii acpi 1.5-2 displays information on ACPI devic
ii ethtool 1:2.6.34-3 display or change Ethernet device
ii hdparm 9.32-1 tune hard disk parameters for high
pn libcache-cache-perl <none> (no description available)
ii libcrypt-ssleay-perl 0.57-2 Support for https protocol in LWP
ii libdbd-mysql-perl 4.016-1 Perl5 database interface to the My
pn libdbd-pg-perl <none> (no description available)
pn liblwp-useragent-determ <none> (no description available)
pn libnet-irc-perl <none> (no description available)
ii libnet-ssleay-perl 1.36-1 Perl module for Secure Sockets Lay
pn libtext-csv-xs-perl <none> (no description available)
ii libwww-perl 5.836-1 Perl HTTP/WWW client/server librar
ii libxml-simple-perl 2.18-3 Perl module for reading and writin
ii lm-sensors 1:3.1.2-6 utilities to read temperature/volt
ii logtail 1.3.13 Print log file lines that have not
ii munin 1.4.5-3 network-wide graphing framework (g
pn munin-java-plugins <none> (no description available)
ii munin-plugins-extra 1.4.5-3 network-wide graphing framework (u
ii mysql-client 5.1.49-3 MySQL database client (metapackage
ii mysql-client-5.1 [mysql 5.1.49-3 MySQL database client binaries
ii net-tools 1.60-23 The NET-3 networking toolkit
ii python 2.6.6-3+squeeze6 interactive high-level object-orie
ii ruby 4.5 An interpreter of object-oriented
ii smartmontools 5.39.1+svn3124-2 control and monitor storage system
-- Configuration Files:
/etc/munin/munin-node.conf changed [not included]
/etc/munin/plugin-conf.d/munin-node [Errno 13] Permission denied:
u'/etc/munin/plugin-conf.d/munin-node'
-- no debconf information
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]