I setup an influxdb that receives input from several drives on a machine running collectd and it's smart plugin.

I started looking at the influxdb database to see what measurements / series (SMART attributes) showed up...

I narrowed it down to the following for each drive on each host:

airflow-temperature-celsius
command-timeout
current-pending-sector
end-to-end-error
hardware-ecc-recovered
head-flying-hours
high-fly-writes
offline-uncorrectable
power-cycle-count
power-on-hours
raw-read-error-rate
reallocated-sector-count
reported-uncorrect
runtime-bad-block-total
seek-error-rate
spin-retry-count
spin-up-time
start-stop-count
temperature-celsius-2
total-lbas-read
total-lbas-written
udma-crc-error-count

I also found a list of Critical SMART attributes on Wikipedia <https://en.wikipedia.org/wiki/S.M.A.R.T.> (you can sort them as such), but I'm uncertain of how they relate to what I'm receiving from collectd.

I'd like to be able to detect when a drive is about to go bad, and Wikipedia says that I have about a 50% chance to detecting it if I use such attributes to graph changes in something like grafana.

Unfortunately though, I'm also uncertain of which attributes might be non-sense (on my drives) and which ones are real; and where to find such information... Also there doesn't seem to be much documentation about how the attributes in influxdb relate to those listed in wikipedia...(though I would think it has something to do with the source code for the plugin and the identifier listed in wikipedia)

Please let me know if you know anything about where to start on this.

--
Thank you,

Andrew J. Leer

Git Hub: http://bit.ly/aleer_github

Stack Exchange: http://bit.ly/aleer_stk_exch

Linked-In: http://bit.ly/2d5D1DF
_______________________________________________
collectd mailing list
collectd@verplant.org
https://mailman.verplant.org/listinfo/collectd

Reply via email to