Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-10 Thread Dominik Zyla
On Mon, Mar 07, 2011 at 12:43:03PM -0800, Dr. Ed Morbius wrote:
 OMSA conflicts with mega-cli, though we may find that the latter is the
 more useful package.  Both are pretty byzantine, the Dell stuff simply
 doesn't have docs (in particular: docs on how to interpret the omconfig
 log output).

We're using megacli wrapped by perl to provide information about Perc
events. It works quite well as far.

-- 
Dominik Zyla



pgp8bhjUch9zV.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-10 Thread Kai Schaetzl
Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:

 We're using megacli wrapped by perl to provide information about Perc
 events. It works quite well as far.

Do you have a megacli rpm that works with the CentOS-provided drivers, 
which is MPT 3.something? I googled about this some time ago and there's 
an rpm mentioned here and there that contains only the megacli utility, 
but it's not downloadable anymore from anywhere. I got hold of a package 
that cotnains the 4 version, but that doesn't work with the CentOS 
drivers. LSI themselves provide only the complete MegaRAID driver/package 
for download and it's not clear if the singe megacli utility is included 
or if installing it may overwrite the built-in driver.

Kai


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-10 Thread Dominik Zyla
On Thu, Mar 10, 2011 at 06:47:09PM +0100, Kai Schaetzl wrote:
 Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:
 
  We're using megacli wrapped by perl to provide information about Perc
  events. It works quite well as far.
 
 Do you have a megacli rpm that works with the CentOS-provided drivers, 
 which is MPT 3.something? I googled about this some time ago and there's 
 an rpm mentioned here and there that contains only the megacli utility, 
 but it's not downloadable anymore from anywhere. I got hold of a package 
 that cotnains the 4 version, but that doesn't work with the CentOS 
 drivers. LSI themselves provide only the complete MegaRAID driver/package 
 for download and it's not clear if the singe megacli utility is included 
 or if installing it may overwrite the built-in driver.

It's some single binary version, compiled statically.

-- 
Dominik Zyla



pgp8OGyzxf3Vs.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-08 Thread Ross Walker
On Mar 7, 2011, at 3:43 PM, Dr. Ed Morbius dredmorb...@gmail.com wrote:

 We're looking for tools to be used in monitoring the PERC H800 arrays on
 a set of database servers running CentOS 5.5.
 
 We've installed most of the OMSA (Dell monitoring) suite.
 
 Our current alerting is happening through SNMP, though it's a bit hit or
 miss (we apparently missed a couple of earlier predictive failure alerts
 on one drive).
 
 OMSA conflicts with mega-cli, though we may find that the latter is the
 more useful package.  Both are pretty byzantine, the Dell stuff simply
 doesn't have docs (in particular: docs on how to interpret the omconfig
 log output).
 
 Ideally we'd like something which could be run as a Nagios plugin or
 cron job providing information on RAID status and/or possible disk
 errors.  Probably both, actually.

I can't speak about nagios, but I have my OMSA setup to send traps, but for 
critical errors to also send emails and it works well for us.

If you link the shared lib (forget the paths) and install megacli with --nodeps 
you can have both installed.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Dr. Ed Morbius
We're looking for tools to be used in monitoring the PERC H800 arrays on
a set of database servers running CentOS 5.5.

We've installed most of the OMSA (Dell monitoring) suite.

Our current alerting is happening through SNMP, though it's a bit hit or
miss (we apparently missed a couple of earlier predictive failure alerts
on one drive).

OMSA conflicts with mega-cli, though we may find that the latter is the
more useful package.  Both are pretty byzantine, the Dell stuff simply
doesn't have docs (in particular: docs on how to interpret the omconfig
log output).

Ideally we'd like something which could be run as a Nagios plugin or
cron job providing information on RAID status and/or possible disk
errors.  Probably both, actually.

Thanks in advance.

-- 
Dr. Ed Morbius, Chief Scientist /|
  Robot Wrangler / Staff Psychologist| When you seek unlimited power
Krell Power Systems Unlimited|  Go to Krell!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Eero Volotinen
2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
 We're looking for tools to be used in monitoring the PERC H800 arrays on
 a set of database servers running CentOS 5.5.

 We've installed most of the OMSA (Dell monitoring) suite.

 Our current alerting is happening through SNMP, though it's a bit hit or
 miss (we apparently missed a couple of earlier predictive failure alerts
 on one drive).

 OMSA conflicts with mega-cli, though we may find that the latter is the
 more useful package.  Both are pretty byzantine, the Dell stuff simply
 doesn't have docs (in particular: docs on how to interpret the omconfig
 log output).

 Ideally we'd like something which could be run as a Nagios plugin or
 cron job providing information on RAID status and/or possible disk
 errors.  Probably both, actually.

if your system supports omreport (comes with omsa) then this is good solution:
http://folk.uio.no/trondham/software/check_openmanage.html

--
Eero
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Dr. Ed Morbius
on 22:57 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
 2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
  We're looking for tools to be used in monitoring the PERC H800 arrays on
  a set of database servers running CentOS 5.5.
 
  We've installed most of the OMSA (Dell monitoring) suite.
 
  Our current alerting is happening through SNMP, though it's a bit hit or
  miss (we apparently missed a couple of earlier predictive failure alerts
  on one drive).
 
  OMSA conflicts with mega-cli, though we may find that the latter is the
  more useful package.  Both are pretty byzantine, the Dell stuff simply
  doesn't have docs (in particular: docs on how to interpret the omconfig
  log output).
 
  Ideally we'd like something which could be run as a Nagios plugin or
  cron job providing information on RAID status and/or possible disk
  errors.  Probably both, actually.
 
 if your system supports omreport (comes with omsa) then this is good solution:
 http://folk.uio.no/trondham/software/check_openmanage.html

So ... this slots on top of OMSA to provide reporting?

-- 
Dr. Ed Morbius, Chief Scientist /|
  Robot Wrangler / Staff Psychologist| When you seek unlimited power
Krell Power Systems Unlimited|  Go to Krell!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Eero Volotinen
2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
 on 22:57 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
 2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
  We're looking for tools to be used in monitoring the PERC H800 arrays on
  a set of database servers running CentOS 5.5.
 
  We've installed most of the OMSA (Dell monitoring) suite.
 
  Our current alerting is happening through SNMP, though it's a bit hit or
  miss (we apparently missed a couple of earlier predictive failure alerts
  on one drive).
 
  OMSA conflicts with mega-cli, though we may find that the latter is the
  more useful package.  Both are pretty byzantine, the Dell stuff simply
  doesn't have docs (in particular: docs on how to interpret the omconfig
  log output).
 
  Ideally we'd like something which could be run as a Nagios plugin or
  cron job providing information on RAID status and/or possible disk
  errors.  Probably both, actually.

 if your system supports omreport (comes with omsa) then this is good 
 solution:
 http://folk.uio.no/trondham/software/check_openmanage.html

 So ... this slots on top of OMSA to provide reporting?

this plugin parsers omreport output and uses it for nagios output.

omsa webserver is not required, but working omreport cli is. .. works
great on my servers.

--
Eero
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Blake Hudson
 Original Message  
Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools
From: Dr. Ed Morbius dredmorb...@gmail.com
To: CentOS User list centos@centos.org
Date: Monday, March 07, 2011 2:43:03 PM
 We're looking for tools to be used in monitoring the PERC H800 arrays on
 a set of database servers running CentOS 5.5.
If you purchased the server with an add-in DRAC, the DRAC can provide
email alerts if an array becomes degraded (or just about any other
hardware fault). This isn't necessarily a replacement for your current
monitoring, but it can be used to supplement or compliment it.

--Blake
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Dr. Ed Morbius
on 16:04 Mon 07 Mar, Blake Hudson (bl...@ispn.net) wrote:
  Original Message  
 Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools
 From: Dr. Ed Morbius dredmorb...@gmail.com
 To: CentOS User list centos@centos.org
 Date: Monday, March 07, 2011 2:43:03 PM
  We're looking for tools to be used in monitoring the PERC H800 arrays on
  a set of database servers running CentOS 5.5.

 If you purchased the server with an add-in DRAC, the DRAC can provide
 email alerts if an array becomes degraded (or just about any other
 hardware fault). This isn't necessarily a replacement for your current
 monitoring, but it can be used to supplement or compliment it.

The iDRAC /doesn't/ report on RAID / storage configuration or status.

iDRAC 6, Dell r610, onboard PERC H700, offboard PERC H800 (MD1200
array).  BIOS version 2.1.15, Firmware 1.54 (Build 15).

We get batteries, fans, intrusion, power, removable flash media, temps,
and volts, but not storage.o

The iDRAC is pretty good compared with some past Dell offerings.
Ability to boot virtual media in particular is very slick (I can specify
local removable storage or a drive image and mount it for booting /
diagnostics remotely).

But no RAID / storage management or monitoring.

-- 
Dr. Ed Morbius, Chief Scientist /|
  Robot Wrangler / Staff Psychologist| When you seek unlimited power
Krell Power Systems Unlimited|  Go to Krell!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Dr. Ed Morbius
on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorb...@gmail.com) wrote:
 We're looking for tools to be used in monitoring the PERC H800 arrays on
 a set of database servers running CentOS 5.5.

Pardoning the self-reply, but one issue we've ahd is reconciling the
omcontrol log report with the Dell Server Manager syslog messages.

omcontrol reported a predictive drive failure, but we (and three Dell
storage/support techs) had trouble identifying which actual device was
being reporrted as bad.


From 'omconfig storage controller action=exportlog controller=0' output:

03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42:  96=Predictive failure: PD 
00(e0x08/s2)
03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2)

In /var/log/messages (timestamp/hostname trimmed):

Server Administrator: Storage Service EventID: 2243  The Patrol Read has 
stopped.:  Controller 0 (PERC H800 Adapter) 
Server Administrator: Storage Service EventID: 2049  Physical disk removed: 
 Physical Disk 0:0:2 Controller 0, Connector 0

The Server Administrator reports of a slot 2 failure correspond to the
drive which was physically replaced.

The OMSA omconfig report is throwing us a bunch of crud about some
device, but Dell variously identified it as slot 0 and slot 9.  We're
now getting from them that /s2 identifies slot 2.


Dell said point blank you're not going to have any luck with that as
far as documentation of the OMSA log report format and parsing being
documented.  Does anyone have a clue as to WTF it's actaully trying to
say, or what this tool is based off of (I'm suspecting mega-cli on a
general hunch but not much stronger).

Enterprise support  indeed.

-- 
Dr. Ed Morbius, Chief Scientist /|
  Robot Wrangler / Staff Psychologist| When you seek unlimited power
Krell Power Systems Unlimited|  Go to Krell!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Dr. Ed Morbius
on 23:15 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
 2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
  on 22:57 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
  2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
   We're looking for tools to be used in monitoring the PERC H800 arrays on
   a set of database servers running CentOS 5.5.
  
   We've installed most of the OMSA (Dell monitoring) suite.
  
   Our current alerting is happening through SNMP, though it's a bit hit or
   miss (we apparently missed a couple of earlier predictive failure alerts
   on one drive).
  
   OMSA conflicts with mega-cli, though we may find that the latter is the
   more useful package.  Both are pretty byzantine, the Dell stuff simply
   doesn't have docs (in particular: docs on how to interpret the omconfig
   log output).
  
   Ideally we'd like something which could be run as a Nagios plugin or
   cron job providing information on RAID status and/or possible disk
   errors.  Probably both, actually.
 
  if your system supports omreport (comes with omsa) then this is good 
  solution:
  http://folk.uio.no/trondham/software/check_openmanage.html
 
  So ... this slots on top of OMSA to provide reporting?
 
 this plugin parsers omreport output and uses it for nagios output.

Is it running/invoking omreport or relying on periodic runs?  I'll dig
through the docs but if you know this off-hand it'd be helpful.
 
 omsa webserver is not required, but working omreport cli is. .. works
 great on my servers.

Good to know, much appreciated.

-- 
Dr. Ed Morbius, Chief Scientist /|
  Robot Wrangler / Staff Psychologist| When you seek unlimited power
Krell Power Systems Unlimited|  Go to Krell!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Dell PERC H800 commandline RAID monitoring tools

2011-03-07 Thread Eero Volotinen
2011/3/8 Dr. Ed Morbius dredmorb...@gmail.com:
 on 23:15 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
 2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
  on 22:57 Mon 07 Mar, Eero Volotinen (eero.voloti...@iki.fi) wrote:
  2011/3/7 Dr. Ed Morbius dredmorb...@gmail.com:
   We're looking for tools to be used in monitoring the PERC H800 arrays on
   a set of database servers running CentOS 5.5.
  
   We've installed most of the OMSA (Dell monitoring) suite.
  
   Our current alerting is happening through SNMP, though it's a bit hit or
   miss (we apparently missed a couple of earlier predictive failure alerts
   on one drive).
  
   OMSA conflicts with mega-cli, though we may find that the latter is the
   more useful package.  Both are pretty byzantine, the Dell stuff simply
   doesn't have docs (in particular: docs on how to interpret the omconfig
   log output).
  
   Ideally we'd like something which could be run as a Nagios plugin or
   cron job providing information on RAID status and/or possible disk
   errors.  Probably both, actually.
 
  if your system supports omreport (comes with omsa) then this is good 
  solution:
  http://folk.uio.no/trondham/software/check_openmanage.html
 
  So ... this slots on top of OMSA to provide reporting?

 this plugin parsers omreport output and uses it for nagios output.

 Is it running/invoking omreport or relying on periodic runs?  I'll dig
 through the docs but if you know this off-hand it'd be helpful.

It runs omreport each time nagios polls it via nrpe or snmp.

--
Eero
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos