We use gangalia for monitoring our cluster and use a nagios plugin that
interfaces with gmeta node to setup various rules around number of
datanodes, missing/corrupted blocks etc

http://www.cloudera.com/blog/2009/03/hadoop-metrics/

http://exchange.nagios.org/directory/Plugins/Network-and-Systems-Management/
Others/check_ganglia/details




> From: Arthur Caranta <[email protected]>
> Date: Mon, 04 Oct 2010 15:46:19 +0200
> To: <[email protected]>
> Subject: Re: Datanode Registration DataXceiver java.io.EOFException
> 
>  On 04/10/10 15:42, Steve Loughran wrote:
>> On 04/10/10 14:30, Arthur Caranta wrote:
>>>   Damn I found the answer to this problem, thanks to someone on the
>>> #hadoop IRC channel ...
>>> 
>>> It was a network check I added for our supervision ... therefore every 5
>>> minutes the supervision connects to the datanode port to check if it is
>>> alive and then disconnects ...
>>> 
>> 
>> why not just GET the various local pages and let your HTTP monitoring
>> tools do the work.
>> 
>> 
> True ... however the tcp method was the fastest to implement and script
> with our current supervision system.
> but I think I might be switching monitoring method.


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.


Reply via email to