Re: Hadoop Node Monitoring
I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote: Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed- capriolo/ http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. Thank you, -John
Re: Hadoop Node Monitoring
Definatly check out my presentation above on cloudera's site link is above. Hadoop specific counters are available. Each component namenode, datanode, etc has counter objects associated with it. Hadoop allows you to push statistics at ganglia so this is one nice option. More of less once you get the configuration correct each node will send its performance data to ganglia. Cacti works in the other direction by polling nodes and pulling counter information from them. The perk of the cacti setup is that I spent some time grouping related variables into a single graph. The perk of the ganglia configuration is there is no per node configuration, since it just pushes counter data. On the cacti side I am looking to add some graphs that pull data from the job client, currently running maps and reduces etc. So to answer you question people use all three and they all have access to the same data. On 11/12/09, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote: Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed- capriolo/ http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. Thank you, -John
Re: Hadoop Node Monitoring
We're about in the same boat as you. We use Nagios and have Cacti for other things so I'll probably use it for hadoop as well. Ganglia seems interesting but not too simple to setup. We also tried Cloudera Desktop which gives you a nice interface to see what's happening but it requires using Cloudera's hadoop and seems more focused on real-time status as opposed to background monitoring. I'd be interested to hear more of what other people have been successful with. Anyone? -Kevin On Thu, Nov 12, 2009 at 6:22 AM, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote: Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. Thank you, -John
Re: Hadoop Node Monitoring
thanks for the info. So you are saying to install both cacti and ganglia, which is what I was kind of thinking to see which one I like the best, and which one gives the best info. The only thing is that the ganglia install is not straightforward. Do you have any recommendations for installing it on CentOS 5? I followed the steps at IBM, and came up with this error: gmond: error while loading shared libraries: /usr/lib/libganglia-3.1.2.so.0: cannot restore segment prot after reloc: Permission denied -John On Nov 12, 2009, at 9:36 AM, Edward Capriolo wrote: Definatly check out my presentation above on cloudera's site link is above. Hadoop specific counters are available. Each component namenode, datanode, etc has counter objects associated with it. Hadoop allows you to push statistics at ganglia so this is one nice option. More of less once you get the configuration correct each node will send its performance data to ganglia. Cacti works in the other direction by polling nodes and pulling counter information from them. The perk of the cacti setup is that I spent some time grouping related variables into a single graph. The perk of the ganglia configuration is there is no per node configuration, since it just pushes counter data. On the cacti side I am looking to add some graphs that pull data from the job client, currently running maps and reduces etc. So to answer you question people use all three and they all have access to the same data. On 11/12/09, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote: Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed- capriolo/ http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. Thank you, -John
Re: Hadoop Node Monitoring
Kevin, What did you think of Cloudera Desktop? Where you able to get it running with a vanilla hadoop install? -John On Nov 12, 2009, at 9:40 AM, Kevin Sweeney wrote: We're about in the same boat as you. We use Nagios and have Cacti for other things so I'll probably use it for hadoop as well. Ganglia seems interesting but not too simple to setup. We also tried Cloudera Desktop which gives you a nice interface to see what's happening but it requires using Cloudera's hadoop and seems more focused on real-time status as opposed to background monitoring. I'd be interested to hear more of what other people have been successful with. Anyone? -Kevin On Thu, Nov 12, 2009 at 6:22 AM, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at 12:51 AM, Kevin Sweeney wrote: Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from- ed-capriolo/ http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. Thank you, -John
About Hadoop pseudo distribution
Hi All, I have been trying to set up a hadoop cluster on a number of machines, a few of which are multicore machines. I have been wondering whether the hadoop pseudo distribution is something that can help me take advantage of the multiple cores on my machines. All the tutorials say that the pseudo distribution mode lets you start each daemon in a separate java process. I have the following configuration settings for hadoop-site.xml: property namefs.default.name/name valuehdfs://athena:9000/value /property property namemapred.job.tracker/name valueathena:9001/value /property property namedfs.replication/name value2/value /property I am not sure if this is really running in the pseudo-distribution mode. Are there any indicators or outputs that confirm what mode you are running in? -- View this message in context: http://old.nabble.com/About-Hadoop-pseudo-distribution-tp26322382p26322382.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: About Hadoop pseudo distribution
kvorion wrote: Hi All, I have been trying to set up a hadoop cluster on a number of machines, a few of which are multicore machines. I have been wondering whether the hadoop pseudo distribution is something that can help me take advantage of the multiple cores on my machines. All the tutorials say that the pseudo distribution mode lets you start each daemon in a separate java process. I have the following configuration settings for hadoop-site.xml: You should just give every node with more cores more task tracker slots; this will give it more work from the Job Tracker
Re: Building Hadoop from Source ?
Hi Sid Check out the Building section in this link - http://wiki.apache.org/hadoop/HowToRelease . Its pretty straight forward. If you choose to not remove the test targets expect the build to take upwards of 2 hours as it runs through all the unit tests. Kind regards Steve Watt From: Siddu siddu.s...@gmail.com To: common-user@hadoop.apache.org Date: 11/12/2009 12:14 PM Subject: Building Hadoop from Source ? Hi all, I want to build hadoop from source rather than downloading the already built tar ball. Can someone please give me the steps or link to any pointers please Thanks in advance -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him
Re: Hadoop Node Monitoring
On 11/11/09 9:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and having all sorts of troubles. So any idea there would be very helpful. We've working on getting our stats into Zenoss via the JMX connector and SNMP because Ganglia seems to have some fundamental issues (like grouping of hosts is a *client* side config). Note that Zenoss is available in both open source and commercial forms. We're using the commercial version, but the open source version would probably be just as good. But that aside: We're taking the approach of grid health by watching and monitoring the dead/live node count by scraping the NN and JT web pages. We also do daily fsck's, lsr's, and run a cut-down version of gridmix. While monitoring individual nodes is useful in a pro-active sense, the bigger your grid gets, the less important it becomes.
Re: About Hadoop pseudo distribution
If I understand you correctly you can run jps and see the java jvm's running on each machine - that should tell you if you are running in pseudo mode or not. --- On Thu, 11/12/09, kvorion kveinst...@gmail.com wrote: From: kvorion kveinst...@gmail.com Subject: About Hadoop pseudo distribution To: core-u...@hadoop.apache.org Date: Thursday, November 12, 2009, 12:02 PM Hi All, I have been trying to set up a hadoop cluster on a number of machines, a few of which are multicore machines. I have been wondering whether the hadoop pseudo distribution is something that can help me take advantage of the multiple cores on my machines. All the tutorials say that the pseudo distribution mode lets you start each daemon in a separate java process. I have the following configuration settings for hadoop-site.xml: property namefs.default.name/name valuehdfs://athena:9000/value /property property namemapred.job.tracker/name valueathena:9001/value /property property namedfs.replication/name value2/value /property I am not sure if this is really running in the pseudo-distribution mode. Are there any indicators or outputs that confirm what mode you are running in? -- View this message in context: http://old.nabble.com/About-Hadoop-pseudo-distribution-tp26322382p26322382.html Sent from the Hadoop core-user mailing list archive at Nabble.com.