Hi All,
        Most newer devices have useful error statistics available from log pages.
I have a SCSI utility which dumps this information (a snippet is attached).
You can
setup various thresholds to be notified automatically, but to date, I have
not seen
any drivers that utilize this feature.

        My utility obtains this information by issuing a SCSI LOG SENSE command
through the pass-through (sg) interface.  A utility could be written to
periodically grab
these statistics then take appropriate action (send administrator e-mail,
etc).

        Unfortunately, I'm not at liberty to make 'scu' available to other Linux
users
at this time.  This is a port of the Tru64 UNIX SCSI utility, and I haven't
been authorized
to release this outside of Compaq (yet).

        FWIW:  This is the method I'd choose to create a SCSI health monitor.

        As you can see, this drive is having some problems.

Regards,
Robin
=====================================================================

scu> sbtl 1 1 0
Device: RZ29B, Bus: 1, Target: 1, Lun: 0, Type: Direct Access
scu> show log pages

Write Error Counter Parameters (Page 0x2 - Current Cumulative Values):

      Parameter 0x1, Counter Value: 0 (Errors corrected with possible
delays)
      Parameter 0x2, Counter Value: 0 (Total rewrites or rereads)
      Parameter 0x3, Counter Value: 0 (Total errors corrected)
      Parameter 0x4, Counter Value: 0 (Total times correction algorithm
processed)
      Parameter 0x5, Counter Value: 0 (Total bytes processed)
      Parameter 0x6, Counter Value: 0 (Total uncorrected errors)

Read Error Counter Parameters (Page 0x3 - Current Cumulative Values):

      Parameter 0x0, Counter Value: 40 (Errors corrected without substantial
delay)
      Parameter 0x1, Counter Value: 0 (Errors corrected with possible
delays)
      Parameter 0x2, Counter Value: 0 (Total rewrites or rereads)
      Parameter 0x3, Counter Value: 40 (Total errors corrected)
      Parameter 0x4, Counter Value: 40 (Total times correction algorithm
processed)
      Parameter 0x5, Counter Value: 15396903635456 (Total bytes processed)
      Parameter 0x6, Counter Value: 0 (Total uncorrected errors)

Verify Error Counter Parameters (Page 0x5 - Current Cumulative Values):

      Parameter 0x0, Counter Value: 2 (Errors corrected without substantial
delay)
      Parameter 0x1, Counter Value: 0 (Errors corrected with possible
delays)
      Parameter 0x2, Counter Value: 0 (Total rewrites or rereads)
      Parameter 0x3, Counter Value: 2 (Total errors corrected)
      Parameter 0x4, Counter Value: 2 (Total times correction algorithm
processed)
      Parameter 0x5, Counter Value: 4290600960 (Total bytes processed)
      Parameter 0x6, Counter Value: 0 (Total uncorrected errors)

Non-Medium Error Counter Parameters (Page 0x6 - Current Cumulative Values):

      Parameter 0x0, Counter Value: 0

scu> show log pages full

Write Error Counter Parameters (Page 0x2 - Current Cumulative Values):

                         Page Code: 0x2
                       Page Length: 46
                    Parameter Code: 0x1
   List Parameter Data Format (LP): 0 (Data Counter)
      Threshold Met Criteria (TMC): 0 (Notify of every update of cumulative
value)
   Enable Threshold Criteria (ETC): No (Threshold comparison disabled)
         Target Save Disable (TSD): 1 (Does not/shall not use it's save
method)
                 Disable Save (DS): 0 (Parameter is Savable)
               Disable Update (DU): No (Data counting is enabled)
                  Parameter Length: 2
                     Counter Value: 0 (Errors corrected with possible
delays)
                .
                .
                .
-----Original Message-----
From:   [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Georg P. Israel
Sent:   Wednesday, September 29, 1999 3:07 AM
To:     [EMAIL PROTECTED]
Subject:        health monitor for SCSI devices?

Dear SCSI Readers,

I was wondering if there is an easy way to monitor the health of SCSI
devices?
What I'm looking for are early signs for the failure of a storage device.

I guess devices usually don't brake down from one second to the next
but probably start to develop anomalies prior to there final break down.

Hence, it would probably help if we had e.g. one file for every
storage device that records unusual event's i.e. retries, timeout, checksum
erro
rs ....


Looking forward to read some comments on this.
Thought, I guess something like this does probably already exist ;-)


Georg Israel
<g.israel @ ieee.org>

---

Date: 29-Sep-99
Time: 09:02:14
----------------------------------------------------------
Georg P. Israel      Phone: +41-1-497 1450
CSEM
Badenerstr. 569
8048 Zuerich
SWITZERLAND
-----------------------------------------------------------
Thank goodness modern convenience is a thing of the remote future.
                -- Pogo, by Walt Kelly


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to