raid status monitoring (was: Re: How to detect hardware RAID1 failure?)

Jeffrey Paul Sat, 03 Mar 2001 11:10:46 -0800
At 02:08 PM 2001-03-03, C. R. Oldham wrote:


>Joe Janitor wrote:
>
> > Dell Poweredge 2450 running RH6.2, PERC2 hardware raid
> > controller - RAID1 configuration.
>
>Someone who has a PERC2 controller can probably answer this better than I
>can.
>
>You're using hardware RAID, which is implemented at the driver and/or
>hardware level.  The md code in the kernel is not involved, thus the lack of
>information from /proc/mdstat.
>
>Check to see if the PERC2 driver adds an entry in the /proc filesystem that
>can give you some clue as to the status of the controller.  We use
>ICP-Vortex cards, and they provide an entries in /proc/scsi/gdth that give
>detailed info on the status of the card and the devices attached.
>


I think this is one of the major things lacking in the current linux-raid 
"support".... Error reporting.  I've had a few ideas that might improve the 
situation, and make it easier for people to write programs or scripts that 
can take active measures when a failure occurs (i.e. paging or emailing 
someone).

I've noticed that /proc/mdstat's format has morphed yet again in 2.4(.2?).

Here are my ideas, some of them might be worth ignoring:

*) unified failure status interface (/proc entry, most likely) for hard 
*and* soft raid devices... maybe even an OK/Not OK status with a short 
message from the underlying raid driver as to why it's in Not OK 
mode.  This would ease implementation quite a bit.  The output would 
preferably be easily machine readable.

*) An interface to more detailed and more machine-readable softraid 
information.  Maybe a one-line-per-disk-being-used-in-a-softraid-set 
format... byte counters and other useless but fun-to-graph information 
might be neat as well. up/down/hot spare status easily identifiable.

*) A prefix to all softraid (maybe hardware raid too) entries to the 
syslog, i.e. 'RAID: '

*) user-definable log-facility for raid messages (i.e. 
local2.*  /var/adm/raidlog)


I'm not a kernel hacker, I don't even hack c.  I'm a small-time sysadmin, 
and some or all of these might be poorly thought out, but I know they'd all 
be helpful.  Several times i've considered writing a script with my modest 
perl abilities to monitor /proc/mdstat for failures, but searching for 
underscores-within-brackets or grepping through my huge syslog (even with 
offsets) seems kludgy.

Having a nice, clean, per-physical-device, machine readable file with the 
softraid information in it would make it really easy to status-monitor my 
raids... and a unified soft/hard raid status file would open the door to 
all kinds of different applications that work, regardless of what kind of 
raid you're running.

-j



----------------------------------------------
[EMAIL PROTECTED]      -           0x514DB5CB
he who lives these words shall not taste death
becoming nothing yeah yeah
forever liquid cool

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
raid status monitoring (was: Re: How to detect hardware RAID1 failure?)

Reply via email to