Re: Distributed Agent

2009-04-15 Thread Rasit OZDAS
Take a look at this topic: http://dsonline.computer.org/portal/site/dsonline/menuitem.244c5fa74f801883f1a516106bbe36ec/index.jsp?pName=dso_level1_aboutpath=dsonline/topics/agentsfile=about.xmlxsl=generic.xsl; 2009/4/14 Burak ISIKLI burak.isi...@yahoo.com: Hello everyone; I want to write a

Re: Modeling WordCount in a different way

2009-04-15 Thread Pankil Doshi
On Wed, Apr 15, 2009 at 1:26 AM, Sharad Agarwal shara...@yahoo-inc.comwrote: I am trying complex queries on hadoop and in which i require more than one job to run to get final result..results of job one captures few joins of the query and I want to pass those results as input to 2nd job

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-15 Thread Andy Liu
Not sure if comparing Hadoop to databases is an apples to apples comparison. Hadoop is a complete job execution framework, which collocates the data with the computation. I suppose DBMS-X and Vertica do that to some certain extent, by way of SQL, but you're restricted to that. If you want to

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-15 Thread Jonathan Gray
I agree with you, Andy. This seems to be a great look into what Hadoop MapReduce is not good at. Over in the HBase world, we constantly deal with comparisons like this to RDBMSs, trying to determine if one is better than the other. It's a false choice and completely depends on the use case.

Re: hadoop-a small doubt

2009-04-15 Thread Pankil Doshi
Hey , You can do that.That system should have same usrname like those of cluster and ofcourse it should be able to ssh name node.Also it should have hadoop and its hadoop-site.xml should be similar .Then u can access namenode,hdfs etc. if you are willing to see the web interface that can be

How to submit a project to Hadoop/Apache

2009-04-15 Thread Tarandeep Singh
Hi, Can anyone point me to a documentation which explains how to submit a project to Hadoop as a subproject? Also, I will appreciate if someone points me to the documentation on how to submit a project as Apache project. We have a project that is built on Hadoop. It is released to the open

Re: Map-Reduce Slow Down

2009-04-15 Thread Mithila Nagendra
The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the following in it: 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = node19/127.0.0.1

Re: Map-Reduce Slow Down

2009-04-15 Thread Mithila Nagendra
The log file runs into thousands of line with the same message being displayed every time. On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra mnage...@asu.edu wrote: The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the following in it: 2009-04-14 10:08:11,499 INFO

Re: Map-Reduce Slow Down

2009-04-15 Thread Ravi Phulari
Looks like your NameNode is down . Verify if hadoop process are running ( jps should show you all java running process). If your hadoop process are running try restarting your hadoop process . I guess this problem is due to your fsimage not being correct . You might have to format your

Re: How to submit a project to Hadoop/Apache

2009-04-15 Thread Otis Gospodnetic
This is how things get into Apache Incubator: http://incubator.apache.org/ But the rules are, I believe, that you can skip the incubator and go straight under a project's wing (e.g. Hadoop) if the project PMC approves. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

Re: Using 3rd party Api in Map class

2009-04-15 Thread Aaron Kimball
That certainly works, though if you plan to upgrade the underlying library, you'll find that copying files with the correct versions into $HADOOP_HOME/lib rapidly gets tedious, and subtle mistakes (e.g., forgetting one machine) can lead to frustration. When you consider the fact that you're using

Re: Map-Reduce Slow Down

2009-04-15 Thread Aaron Kimball
Hi, I wrote a blog post a while back about connecting nodes via a gateway. See http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ This assumes that the client is outside the gateway and all datanodes/namenode are inside, but the same principles apply. You'll

Re: How to submit a project to Hadoop/Apache

2009-04-15 Thread Aaron Kimball
Tarandeep, You might want to start by releasing your project as a contrib module for Hadoop. The overhead there is much easier -- just get it compiliing in the contrib/ directory, file a JIRA ticket on Hadoop Core, and attach your patch :) - Aaron On Wed, Apr 15, 2009 at 10:29 AM, Otis

Re: Map-Reduce Slow Down

2009-04-15 Thread Mithila Nagendra
Hi Aaron I will look into that thanks! I spoke to the admin who overlooks the cluster. He said that the gateway comes in to the picture only when one of the nodes communicates with a node outside of the cluster. But in my case the communication is carried out between the nodes which all belong to

Re: How to submit a project to Hadoop/Apache

2009-04-15 Thread Tarandeep Singh
Thanks Aaron... yeah it sounds like a much easier approach :) On Wed, Apr 15, 2009 at 11:00 AM, Aaron Kimball aa...@cloudera.com wrote: Tarandeep, You might want to start by releasing your project as a contrib module for Hadoop. The overhead there is much easier -- just get it compiliing in

Datanode Setup

2009-04-15 Thread jpe30
I'm setting up a Hadoop cluster and I have the name node and job tracker up and running. However, I cannot get any of my datanodes or tasktrackers to start. Here is my hadoop-site.xml file... code?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific

Re: Datanode Setup

2009-04-15 Thread Mithila Nagendra
Hi, The replication factor has to be set to 1. Also for you dfs and job tracker configuration you should insert the name of the node rather than the i.p address. For instance: value192.168.1.10:54310/value can be: valuemaster:54310/value The nodes can be renamed by renaming them in the hosts

Re: Datanode Setup

2009-04-15 Thread jpe30
That helps a lot actually. I will try setting up my hosts file tomorrow and make the other changes you suggested. Thanks! Mithila Nagendra wrote: Hi, The replication factor has to be set to 1. Also for you dfs and job tracker configuration you should insert the name of the node rather

Re: Extending ClusterMapReduceTestCase

2009-04-15 Thread czero
I got it all up and working, thanks for your help - it was an issue with me not actually setting the log.dir system property before the cluster startup. Can't believe I missed that one :) As a side note (which you might already be aware of), the example class you're using in Chapter 7

RE: reduce task specific jvm arg

2009-04-15 Thread Koji Noguchi
This sounds like a reasonable request. Created https://issues.apache.org/jira/browse/HADOOP-5684 On our clusters, sometimes users want thin mappers and large reducers. Koji -Original Message- From: Jun Rao [mailto:jun...@almaden.ibm.com] Sent: Thursday, April 09, 2009 10:30 AM To:

Re: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist

2009-04-15 Thread Alex Loddengaard
Data stored to /tmp has no consistency / reliability guarantees. Your OS can delete that data at any time. Configure hadoop-site.xml to store data elsewhere. Grep for /tmp in hadoop-default.xml to see all the configuration options you'll have to change. Here's the list I came up with:

Re: getting DiskErrorException during map

2009-04-15 Thread Jim Twensky
Alex, Yes, I bounced the Hadoop daemons after I changed the configuration files. I also tried setting $HADOOP_CONF_DIR to the directory where my hadop-site.xml file resides but it didn't work. However, I'm sure that HADOOP_CONF_DIR is not the issue because other properties that I changed in

Re: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist

2009-04-15 Thread Pankil Doshi
Thanks Pankil On Wed, Apr 15, 2009 at 5:09 PM, Alex Loddengaard a...@cloudera.com wrote: Data stored to /tmp has no consistency / reliability guarantees. Your OS can delete that data at any time. Configure hadoop-site.xml to store data elsewhere. Grep for /tmp in hadoop-default.xml to

Error reading task output

2009-04-15 Thread Cam Macdonell
Hi, I'm getting the following warning when running the simple wordcount and grep examples. 09/04/15 16:54:16 INFO mapred.JobClient: Task Id : attempt_200904151649_0001_m_19_0, Status : FAILED Too many fetch-failures 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-15 Thread Kevin Peterson
On Tue, Apr 14, 2009 at 2:35 AM, tim robertson timrobertson...@gmail.comwrote: I am considering (for better throughput as maps generate huge request volumes) pregenerating all my tiles (PNG) and storing them in S3 with cloudfront. There will be billions of PNGs produced each at 1-3KB each.

Re: Map-Reduce Slow Down

2009-04-15 Thread jason hadoop
Double check that there is no firewall in place. At one point a bunch of new machines were kickstarted and placed in a cluster and they all failed with something similar. It turned out the kickstart script turned enabled the firewall with a rule that blocked ports in the 50k range. It took us a

RE: More Replication on dfs

2009-04-15 Thread Puri, Aseem
Hi My problem is not that my data is under replicated. I have 3 data nodes. In my hadoop-site.xml I also set the configuration as: property namedfs.replication/name value2/value /property But after this also data is replicated on 3 nodes instead of two nodes. Now, please tell

RE: More Replication on dfs

2009-04-15 Thread Puri, Aseem
Hi My problem is not that my data is under replicated. I have 3 data nodes. In my hadoop-site.xml I also set the configuration as: property namedfs.replication/name value2/value /property But after this also data is replicated on 3 nodes instead of two nodes. Now, please tell