Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
I do already use Nagios, and have been monitoring the availability  
etc, of the network.


But I was hoping to get more insight into the load/workings of the  
hadoop network and Ganglia seemed like a good start.


Do you use either Ganglia or Cacti, or something else?

-John

On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote:

Nagios is always a good start. This webcast has some good  
information on

this subject:

http://www.cloudera
.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed- 
capriolo/


http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ 



On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak 
j...@beforedawnsolutions.com wrote:

Is there a good solution for Hadoop node monitoring?  I know that  
Cacti and
Ganglia are probably the two big ones, but are they the best ones  
to use?

Easiest to setup? Most thorough reporting, etc.

I started to play with Ganglia, and the install is crazy, I am  
installing
it on CentOS and having all sorts of troubles.  So any idea there  
would be

very helpful.

Thank you,

-John






Re: Hadoop Node Monitoring

2009-11-12 Thread Edward Capriolo
Definatly check out my presentation above on cloudera's site link is above.

Hadoop specific counters are available. Each component namenode,
datanode, etc has counter objects associated with it.

Hadoop allows you to push statistics at ganglia so this is one nice
option. More of less once you get the configuration correct each node
will send its performance data to ganglia.

Cacti works in the other direction by polling nodes and pulling
counter information from them.

The perk of the cacti setup is that I spent some time grouping related
variables into a single graph.


The perk of the ganglia configuration is there is no per node
configuration, since it just pushes counter data.

On the cacti side I am looking to add some graphs that pull data from
the job client, currently running maps and reduces etc.

So to answer you question people use all three and they all have
access to the same data.


On 11/12/09, John Martyniak j...@beforedawnsolutions.com wrote:
 I do already use Nagios, and have been monitoring the availability
 etc, of the network.

 But I was hoping to get more insight into the load/workings of the
 hadoop network and Ganglia seemed like a good start.

 Do you use either Ganglia or Cacti, or something else?

 -John

 On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote:

 Nagios is always a good start. This webcast has some good
 information on
 this subject:

 http://www.cloudera
 .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-
 capriolo/

 http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/

 

 On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak 
 j...@beforedawnsolutions.com wrote:

 Is there a good solution for Hadoop node monitoring?  I know that
 Cacti and
 Ganglia are probably the two big ones, but are they the best ones
 to use?
 Easiest to setup? Most thorough reporting, etc.

 I started to play with Ganglia, and the install is crazy, I am
 installing
 it on CentOS and having all sorts of troubles.  So any idea there
 would be
 very helpful.

 Thank you,

 -John






Re: Hadoop Node Monitoring

2009-11-12 Thread Kevin Sweeney
We're about in the same boat as you. We use Nagios and have Cacti for other
things so I'll probably use it for hadoop as well. Ganglia seems interesting
but not too simple to setup. We also tried Cloudera Desktop which gives you
a nice interface to see what's happening but it requires using Cloudera's
hadoop and seems more focused on real-time status as opposed to background
monitoring.

I'd be interested to hear more of what other people have been successful
with. Anyone?

-Kevin

On Thu, Nov 12, 2009 at 6:22 AM, John Martyniak 
j...@beforedawnsolutions.com wrote:

 I do already use Nagios, and have been monitoring the availability etc, of
 the network.

 But I was hoping to get more insight into the load/workings of the hadoop
 network and Ganglia seemed like a good start.

 Do you use either Ganglia or Cacti, or something else?

 -John


 On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote:

  Nagios is always a good start. This webcast has some good information on
 this subject:

 http://www.cloudera

 .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/

 
 http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/
 

 On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak 
 j...@beforedawnsolutions.com wrote:

  Is there a good solution for Hadoop node monitoring?  I know that Cacti
 and
 Ganglia are probably the two big ones, but are they the best ones to use?
 Easiest to setup? Most thorough reporting, etc.

 I started to play with Ganglia, and the install is crazy, I am installing
 it on CentOS and having all sorts of troubles.  So any idea there would
 be
 very helpful.

 Thank you,

 -John






Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak

thanks for the info.

So you are saying to install both cacti and ganglia, which is what I  
was kind of thinking to see which one I like the best, and which one  
gives the best info.


The only thing is that the ganglia install is not straightforward.  Do  
you have any recommendations for installing it on CentOS 5?  I  
followed the steps at IBM, and came up with this error: gmond: error  
while loading shared libraries: /usr/lib/libganglia-3.1.2.so.0: cannot  
restore segment prot after reloc: Permission denied


-John

On Nov 12, 2009, at 9:36 AM, Edward Capriolo wrote:

Definatly check out my presentation above on cloudera's site link is  
above.


Hadoop specific counters are available. Each component namenode,
datanode, etc has counter objects associated with it.

Hadoop allows you to push statistics at ganglia so this is one nice
option. More of less once you get the configuration correct each node
will send its performance data to ganglia.

Cacti works in the other direction by polling nodes and pulling
counter information from them.

The perk of the cacti setup is that I spent some time grouping related
variables into a single graph.


The perk of the ganglia configuration is there is no per node
configuration, since it just pushes counter data.

On the cacti side I am looking to add some graphs that pull data from
the job client, currently running maps and reduces etc.

So to answer you question people use all three and they all have
access to the same data.


On 11/12/09, John Martyniak j...@beforedawnsolutions.com wrote:

I do already use Nagios, and have been monitoring the availability
etc, of the network.

But I was hoping to get more insight into the load/workings of the
hadoop network and Ganglia seemed like a good start.

Do you use either Ganglia or Cacti, or something else?

-John

On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote:


Nagios is always a good start. This webcast has some good
information on
this subject:

http://www.cloudera
.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-
capriolo/

http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/





On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak 
j...@beforedawnsolutions.com wrote:


Is there a good solution for Hadoop node monitoring?  I know that
Cacti and
Ganglia are probably the two big ones, but are they the best ones
to use?
Easiest to setup? Most thorough reporting, etc.

I started to play with Ganglia, and the install is crazy, I am
installing
it on CentOS and having all sorts of troubles.  So any idea there
would be
very helpful.

Thank you,

-John









Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak

Kevin,

What did you think of Cloudera Desktop?  Where you able to get it  
running with a vanilla hadoop install?


-John

On Nov 12, 2009, at 9:40 AM, Kevin Sweeney wrote:

We're about in the same boat as you. We use Nagios and have Cacti  
for other
things so I'll probably use it for hadoop as well. Ganglia seems  
interesting
but not too simple to setup. We also tried Cloudera Desktop which  
gives you
a nice interface to see what's happening but it requires using  
Cloudera's
hadoop and seems more focused on real-time status as opposed to  
background

monitoring.

I'd be interested to hear more of what other people have been  
successful

with. Anyone?

-Kevin

On Thu, Nov 12, 2009 at 6:22 AM, John Martyniak 
j...@beforedawnsolutions.com wrote:

I do already use Nagios, and have been monitoring the availability  
etc, of

the network.

But I was hoping to get more insight into the load/workings of the  
hadoop

network and Ganglia seemed like a good start.

Do you use either Ganglia or Cacti, or something else?

-John


On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote:

Nagios is always a good start. This webcast has some good  
information on

this subject:

http://www.cloudera

.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from- 
ed-capriolo/



http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/




On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak 
j...@beforedawnsolutions.com wrote:

Is there a good solution for Hadoop node monitoring?  I know that  
Cacti

and
Ganglia are probably the two big ones, but are they the best ones  
to use?

Easiest to setup? Most thorough reporting, etc.

I started to play with Ganglia, and the install is crazy, I am  
installing
it on CentOS and having all sorts of troubles.  So any idea there  
would

be
very helpful.

Thank you,

-John









About Hadoop pseudo distribution

2009-11-12 Thread kvorion

Hi All,

I have been trying to set up a hadoop cluster on a number of machines, a few
of which are multicore machines. I have been wondering whether the hadoop
pseudo distribution is something that can help me take advantage of the
multiple cores on my machines. All the tutorials say that the pseudo
distribution mode lets you start each daemon in a separate java process. I
have the following configuration settings for hadoop-site.xml:

property
  namefs.default.name/name
  valuehdfs://athena:9000/value
/property

property
  namemapred.job.tracker/name
  valueathena:9001/value
/property

property
  namedfs.replication/name
  value2/value
/property

I am not sure if this is really running in the pseudo-distribution mode. Are
there any indicators or outputs that confirm what mode you are running in?


-- 
View this message in context: 
http://old.nabble.com/About-Hadoop-pseudo-distribution-tp26322382p26322382.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: About Hadoop pseudo distribution

2009-11-12 Thread Steve Loughran

kvorion wrote:

Hi All,

I have been trying to set up a hadoop cluster on a number of machines, a few
of which are multicore machines. I have been wondering whether the hadoop
pseudo distribution is something that can help me take advantage of the
multiple cores on my machines. All the tutorials say that the pseudo
distribution mode lets you start each daemon in a separate java process. I
have the following configuration settings for hadoop-site.xml:


You should just give every node with more cores more task tracker slots; 
this will give it more work from the Job Tracker


Re: Building Hadoop from Source ?

2009-11-12 Thread Stephen Watt
Hi Sid

Check out the Building section in this link - 
http://wiki.apache.org/hadoop/HowToRelease . Its pretty straight forward. 
If you choose to not remove the test targets expect the build to take 
upwards of 2 hours as it runs through all the unit tests.

Kind regards
Steve Watt



From:
Siddu siddu.s...@gmail.com
To:
common-user@hadoop.apache.org
Date:
11/12/2009 12:14 PM
Subject:
Building Hadoop from Source ?



Hi all,

 I want to build hadoop from source rather than downloading the already
 built tar ball.

Can someone please give me the steps or link to any pointers please

Thanks in advance


-- 
Regards,
~Sid~
I have never met a man so ignorant that i couldn't learn something from 
him




Re: Hadoop Node Monitoring

2009-11-12 Thread Allen Wittenauer
On 11/11/09 9:46 PM, John Martyniak j...@beforedawnsolutions.com wrote:
 Is there a good solution for Hadoop node monitoring?  I know that
 Cacti and Ganglia are probably the two big ones, but are they the best
 ones to use?  Easiest to setup? Most thorough reporting, etc.
 
 I started to play with Ganglia, and the install is crazy, I am
 installing it on CentOS and having all sorts of troubles.  So any idea
 there would be very helpful.


We've working on getting our stats into Zenoss via the JMX connector and
SNMP because Ganglia seems to have some fundamental issues (like grouping of
hosts is a *client* side config).  Note that Zenoss is available in both
open source and commercial forms.  We're using the commercial version, but
the open source version would probably be just as good.

But that aside:

We're taking the approach of grid health by watching and monitoring the
dead/live node count by scraping the NN and JT web pages.  We also do daily
fsck's, lsr's, and run a cut-down version of gridmix.

While monitoring individual nodes is useful in a pro-active sense, the
bigger your grid gets, the less important it becomes.



Re: About Hadoop pseudo distribution

2009-11-12 Thread Raymond Jennings III
If I understand you correctly you can run jps and see the java jvm's running 
on each machine - that should tell you if you are running in pseudo mode or not.

--- On Thu, 11/12/09, kvorion kveinst...@gmail.com wrote:

 From: kvorion kveinst...@gmail.com
 Subject: About Hadoop pseudo distribution
 To: core-u...@hadoop.apache.org
 Date: Thursday, November 12, 2009, 12:02 PM
 
 Hi All,
 
 I have been trying to set up a hadoop cluster on a number
 of machines, a few
 of which are multicore machines. I have been wondering
 whether the hadoop
 pseudo distribution is something that can help me take
 advantage of the
 multiple cores on my machines. All the tutorials say that
 the pseudo
 distribution mode lets you start each daemon in a separate
 java process. I
 have the following configuration settings for
 hadoop-site.xml:
 
 property
   namefs.default.name/name
   valuehdfs://athena:9000/value
 /property
 
 property
   namemapred.job.tracker/name
   valueathena:9001/value
 /property
 
 property
   namedfs.replication/name
   value2/value
 /property
 
 I am not sure if this is really running in the
 pseudo-distribution mode. Are
 there any indicators or outputs that confirm what mode you
 are running in?
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/About-Hadoop-pseudo-distribution-tp26322382p26322382.html
 Sent from the Hadoop core-user mailing list archive at
 Nabble.com.