I have a script that I've modified slightly that collects SMART data and sends
it in a collectd format to influx. I've started moving metrics from collectd
to telegraf though, and would like to port the script as well so that I can
continue collecting SMART info from my servers.
The script, which is executed by the exec plugin for Collectd, outputs the
following format:
PUTVAL vega.home/smartmon-da0/gauge-current_pending_sector interval= N:0
PUTVAL vega.home/smartmon-da0/gauge-offline_uncorrectable interval= N:0
PUTVAL vega.home/smartmon-da0/gauge-raw_read_error_rate interval= N:9
PUTVAL vega.home/smartmon-da0/gauge-reallocated_sector_count interval= N:0
PUTVAL vega.home/smartmon-da0/gauge-spin_up_time interval= N:6950
PUTVAL vega.home/smartmon-da0/temperature-temperature interval= N:36
PUTVAL vega.home/smartmon-da0/gauge-multi_zone_error_rate interval= N:0
PUTVAL vega.home/smartmon-da0/gauge-start_stop_count interval= N:105
PUTVAL vega.home/smartmon-da0/gauge-power_on_hours interval= N:4668
PUTVAL vega.home/smartmon-da0/gauge-load_cycle_count interval= N:147
PUTVAL vega.home/smartmon-da0/gauge-udma_crc_error_count interval= N:0
PUTVAL vega.home/smartmon-da0/gauge-power_cycle_count interval= N:94
Writing a telegraf plugin for SMART data is beyond my level of knowledge, but
I'm hoping to be able to port just the script for now. As I understand it,
Telegraf supports a line format, if this is true, would I just need to modify
the output to say something like:
host.drive metric1=value, metric2=value, metric3=value,....etc so something
like:
influxdb.da0 current_pending_sectors=0, offline_uncorectable=0,
raw_read_error_rate=9
influxdb.da1 current_pending_sectors=0, offline_uncorectable=0,
raw_read_error_rate=12
influxdb.da2 current_pending_sectors=0, offline_uncorectable=0,
raw_read_error_rate=9
etc?
the script if anyone is interested, is:
HOST=`hostname -f`
for disk in "$@"; do
dsk=${disk%:*}
drv=${disk#*:}
id=
if [ "$disk" != "$drv" ]; then
drv="-d $drv"
id=${drv#*,}
else
drv=
fi
eval `/usr/local/bin/sudo /usr/local/sbin/smartctl $drv -A
"/dev/$dsk" | awk '$3 ~ /^0x/ && $2 ~ /^[a-zA-Z0-9_-]+$/ { gsub(/-/, "_");
print "SMART_" $2 "=" $10 }' 2>/dev/null`
[ -n "$SMART_Command_Timeout" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-command_timeout interval=$INTERVAL
N:${SMART_Command_Timeout:-U}"
[ -n "$SMART_Current_Pending_Sector" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-current_pending_sector interval=$INTERVAL
N:${SMART_Current_Pending_Sector:-U}"
[ -n "$SMART_End_to_End_Error" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-end_to_end_error interval=$INTERVAL
N:${SMART_End_to_End_Error:-U}"
[ -n "$SMART_Hardware_ECC_Recovered" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-hardware_ecc_recovered interval=$INTERVAL
N:${SMART_Hardware_ECC_Recovered:-U}"
[ -n "$SMART_Offline_Uncorrectable" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-offline_uncorrectable interval=$INTERVAL
N:${SMART_Offline_Uncorrectable:-U}"
[ -n "$SMART_Raw_Read_Error_Rate" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-raw_read_error_rate interval=$INTERVAL
N:${SMART_Raw_Read_Error_Rate:-U}"
[ -n "$SMART_Reallocated_Sector_Ct" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-reallocated_sector_count interval=$INTERVAL
N:${SMART_Reallocated_Sector_Ct:-U}"
[ -n "$SMART_Reported_Uncorrect" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-reported_uncorrect interval=$INTERVAL
N:${SMART_Reported_Uncorrect:-U}"
[ -n "$SMART_Spin_Up_Time" ] &&
echo "PUTVAL $HOST/smartmon-$dsk$id/gauge-spin_up_time
interval=$INTERVAL N:${SMART_Spin_Up_Time:-U}"
[ -n "$SMART_Airflow_Temperature_Cel" ] &&
echo "PUTVAL $HOST/smartmon-$dsk$id/temperature-airflow
interval=$INTERVAL N:${SMART_Airflow_Temperature_Cel:-U}"
[ -n "$SMART_Temperature_Celsius" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/temperature-temperature interval=$INTERVAL
N:${SMART_Temperature_Celsius:-U}"
[ -n "$SMART_Media_Wearout_Indicator" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-media_wearout_indicator interval=$INTERVAL
N:${SMART_Media_Wearout_Indicator:-U}"
[ -n "$SMART_Multi_Zone_Error_Rate" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-multi_zone_error_rate interval=$INTERVAL
N:${SMART_Multi_Zone_Error_Rate:-U}"
[ -n "$SMART_Start_Stop_Count" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-start_stop_count interval=$INTERVAL
N:${SMART_Start_Stop_Count:-U}"
[ -n "$SMART_Power_On_Hours" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-power_on_hours interval=$INTERVAL
N:${SMART_Power_On_Hours:-U}"
[ -n "$SMART_Load_Cycle_Count" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-load_cycle_count interval=$INTERVAL
N:${SMART_Load_Cycle_Count:-U}"
[ -n "$SMART_UDMA_CRC_Error_Count" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-udma_crc_error_count interval=$INTERVAL
N:${SMART_UDMA_CRC_Error_Count:-U}"
[ -n "$SMART_Power_Cycle_Count" ] &&
echo "PUTVAL
$HOST/smartmon-$dsk$id/gauge-power_cycle_count interval=$INTERVAL
N:${SMART_Power_Cycle_Count:-U}"
done
--
Remember to include the InfluxDB version number with all issue reports
---
You received this message because you are subscribed to the Google Groups
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit
https://groups.google.com/d/msgid/influxdb/4801605d-82dd-4e1d-9f6b-f06130f1975e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.