Re: Best CFM Engine for Hadoop

2009-12-10 Thread John Martyniak
Steve, Rollings RPMs seems like a pain, especially since you can do a lot of the install stuff using yum (CentOS). After looking at the CFM tools, I think that I am going to just write some scripts that will do the loading, because in the end I just need to push config changes. And will

Re: Best CFM Engine for Hadoop

2009-12-09 Thread John Martyniak
, John Martyniak j...@beforedawnsolutions.com wrote: Does anybody have any recommmendations on a CF management app? The ones that I am looking at are Puppet, CFengine, and BCFG2. Thanks for the recommendation in advance. Each have their pros/cons. It is more important to at least use one

Namenode / Data node disk requirements

2009-12-08 Thread John Martyniak
Does the namenode and data node require the same disk requirements, for example On my slave machines they have a 1TB partitiion called /hdfs and a 200 GB partition called /mapreduce for the obvious tasks But on the namenode/jobtracker machine I don't have that, I just have a RAIDED pair

Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread John Martyniak
I have a cluster of 4 machines plus one machine to run nn jt. I have heard that 5 or 6 is the magic #. I will see when I add the next batch of machines. And it seems to running fine. -Jogn On Nov 26, 2009, at 11:38 AM, Yongqiang He heyongqiang...@gmail.com wrote: I think it is

Ganglia

2009-11-23 Thread John Martyniak
Hi, I have been trying to get Hadoop working with Ganglia, and am making some progress. I have upgraded to Hadoop 0.20.1, and that seems to make a big difference, I no longer get any errors related to Ganglia. But when I run gmetad --debug=5, I get the following: [r...@monitor ganglia]#

Re: Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-18 Thread John Martyniak
is quite aged and I haven't driven it to completion yet Brian On Nov 17, 2009, at 8:06 PM, John Martyniak wrote: I get several error messages here is one: more hadoop-hadoop-tasktracker- cloud1.local.beforedawnsolutions.com.out Exception in thread Timer thread for monitoring mapred

Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-17 Thread John Martyniak
Has anybody else had any trouble running hadoop 0.19.2 and Ganglia 3.1.x? I was surfing through the Jira/Google and it seems that there where some issues but have been resolved. Any thoughts would be helpful. Thank you, -John

Re: Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-17 Thread John Martyniak
Leacockhttp://www.brainyquote.com/quotes/authors/s/stephen_leacock.html - I detest life-insurance agents: they always argue that I shall some day die, which is not so. 2009/11/18 John Martyniak j...@beforedawnsolutions.com I get several error messages here is one: more hadoop-hadoop-tasktracker

Re: Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-17 Thread John Martyniak
a stopped clock is right twice a day. 2009/11/18 John Martyniak j...@beforedawnsolutions.com I switched to GangliaContext31 and it threw Class not found exceptions. Any ideas? -John On Nov 17, 2009, at 9:12 PM, Y G wrote: in the metrics conf file you should use gangliaContext31 class

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
-practices-from-ed-capriolo/ On Wed, Nov 11, 2009 at 10:46 PM, John Martyniak j...@beforedawnsolutions.com wrote: Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
. On 11/12/09, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
to hear more of what other people have been successful with. Anyone? -Kevin On Thu, Nov 12, 2009 at 6:22 AM, John Martyniak j...@beforedawnsolutions.com wrote: I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight

Re: NameNode/DataNode JobTracker/TaskTracker

2009-11-11 Thread John Martyniak
, at 6:50 AM, Steve Loughran wrote: John Martyniak wrote: Thanks Todd. I wasn't sure if that is possible. But you pointed out an important point and that is it is just NN and JT that would run remotely. So in order to do this would I just install the complete hadoop instance on each one

Hadoop Node Monitoring

2009-11-11 Thread John Martyniak
Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and

java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
Hello everyone, I am getting this error java.io.IOException: Could not obtain block:, when running on my new cluster. When I ran the same job on the single node it worked perfectly, I then added in the second node, and receive this error. I was running the grep sample job. I am running

Re: java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
rather than depending on 2-3 classes from a project which you otherwise don't use. On 11/10/09 11:32 AM, John Martyniak wrote: Hello everyone, I am getting this error java.io.IOException: Could not obtain block:, when running on my new cluster. When I ran the same job on the single node

NameNode/DataNode JobTracker/TaskTracker

2009-11-09 Thread John Martyniak
Can the NameNode/DataNode JobTracker/TaskTracker run on a server that isn't part of the cluster meaning I would like to run it on a machine that wouldn't participate in the processing of data, and wouldn't participate in the HDFS data sharing, and would solely focus on the

Re: NameNode/DataNode JobTracker/TaskTracker

2009-11-09 Thread John Martyniak
the first time:) -John On Nov 9, 2009, at 1:11 PM, Todd Lipcon wrote: On Mon, Nov 9, 2009 at 7:20 AM, John Martyniak j...@beforedawnsolutions.com wrote: Can the NameNode/DataNode JobTracker/TaskTracker run on a server that isn't part of the cluster meaning I would like to run

Re: Cluster Machines

2009-11-04 Thread John Martyniak
, 2009, at 5:49 PM, Allen Wittenauer wrote: On 11/3/09 2:29 PM, John Martyniak j...@beforedawnsolutions.com wrote: Would you mind telling me the kinds of configured servers that you are running? Our 'real' grid is comprised of shiny Sun 4275s. But our 'non-real' grid is composed of two

Re: Cluster Machines

2009-11-03 Thread John Martyniak
and dedicate one for HDFS and one for MR. -John On Nov 3, 2009, at 1:20 PM, Allen Wittenauer wrote: On 11/3/09 5:25 AM, John Martyniak j...@beforedawnsolutions.com wrote: 1) Should each node have RAID 1, or is it sufficient to have HDFS take care of that? Because for each node I could put a 80

Server types

2009-11-02 Thread John Martyniak
I am gettin ready to set up a hadoop cluster, starting small but going to add pretty quickly. I am planning on running the following on the cluster, hadoop, hdfs, hbase, nutch, and mahout. So far I have two Dell SC1425 dual processor (2.8 ghz), 4 GB Ram, 2 1.5 TB Sata drives, on a gigabit