Your new best friends:  Ganglia and Nagios

Ganglia is great for monitoring cluster-wide resource usage over time.  You'll 
see memory, cpu, disk, network usage over time for entire cluster and for each 
node.  It is very easy to setup because it uses UDP broadcast so no need to 
actually configure nodes in conf files.  HBase 0.19 introduces ganglia metrics 
which will also be available in the ganglia web interface.

http://ganglia.info/

Nagios is good for monitoring services as well as resource utilization.  Rather 
than give data over time, it's aim is really to alert you when something is 
wrong.  For example, when a server is no longer reachable or when available 
disk space reaches a configurable threshold.  It does require a bit more work 
to get up and running because you have to setup your node and service 
configurations.  I have written custom nagios plugins for hadoop and hbase, if 
there's interest I will look at cleaning them up and contrib'ing them.

http://www.nagios.org/

Both are free and essential tools for properly monitoring your cluster.

JG

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward
> J. Yoon
> Sent: Monday, December 01, 2008 7:04 PM
> To: [EMAIL PROTECTED]
> Cc: [email protected]; [EMAIL PROTECTED]
> Subject: Re: Bulk import question.
> 
> I'm considering to store the large-scale web-mail data on the Hbase.
> As you know, there is a lot of mail bomb (e.g. spam, group mail,...,
> etc). So, I tested these.
> 
> Here's my additionally question. Have we a monitoring tool for disk
> space?
> 
> /Edward
> 
> On Tue, Dec 2, 2008 at 11:42 AM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
> > Edward,
> >
> > You are running with insufficient resources -- too little CPU
> > for your task and too little disk for your data.
> >
> > If you are running a mapreduce task and DFS runs out of space
> > for the temporary files, then you indeed should expect
> > aberrant job status from the Hadoop job framework, for
> > example such things as completion status running backwards.
> >
> > I do agree that under these circumstances HBase daemons
> > should fail more gracefully, by entering some kind of
> > degraded read only mode, if DFS is not totally dead. I
> > suspect this is already on a to do list somewhere, and I
> > vaguely recall a jira filed on that topic.
> >
> >   - Andy
> >
> >
> >> From: Edward J. Yoon <[EMAIL PROTECTED]>
> >> Subject: Re: Bulk import question.
> >> To: [email protected], [EMAIL PROTECTED]
> >> Date: Monday, December 1, 2008, 6:26 PM
> >> It was by 'Datanode DiskOutOfSpaceException'. But, I
> >> think daemons should not dead.
> >>
> >> On Wed, Nov 26, 2008 at 1:08 PM, Edward J. Yoon
> >> <[EMAIL PROTECTED]> wrote:
> >> > Hmm. It often occurs to me. I'll check the logs.
> >> >
> >> > On Fri, Nov 21, 2008 at 9:46 AM, Andrew Purtell
> >> <[EMAIL PROTECTED]> wrote:
> >> > > I think a 2 node cluster is simply too small for
> >> > > the full load of everything.
> >> > >
> >
> >
> >
> >
> >
> 
> 
> 
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> [EMAIL PROTECTED]
> http://blog.udanax.org

Reply via email to