Take a look at this topic:
http://dsonline.computer.org/portal/site/dsonline/menuitem.244c5fa74f801883f1a516106bbe36ec/index.jsp?pName=dso_level1_aboutpath=dsonline/topics/agentsfile=about.xmlxsl=generic.xsl;
2009/4/14 Burak ISIKLI burak.isi...@yahoo.com:
Hello everyone;
I want to write a
On Wed, Apr 15, 2009 at 1:26 AM, Sharad Agarwal shara...@yahoo-inc.comwrote:
I am trying complex queries on hadoop and in which i require more than
one
job to run to get final result..results of job one captures few joins of
the
query and I want to pass those results as input to 2nd job
Not sure if comparing Hadoop to databases is an apples to apples
comparison. Hadoop is a complete job execution framework, which collocates
the data with the computation. I suppose DBMS-X and Vertica do that to some
certain extent, by way of SQL, but you're restricted to that. If you want
to
I agree with you, Andy.
This seems to be a great look into what Hadoop MapReduce is not good at.
Over in the HBase world, we constantly deal with comparisons like this to
RDBMSs, trying to determine if one is better than the other. It's a false
choice and completely depends on the use case.
Hey ,
You can do that.That system should have same usrname like those of cluster
and ofcourse it should be able to ssh name node.Also it should have hadoop
and its hadoop-site.xml should be similar .Then u can access namenode,hdfs
etc.
if you are willing to see the web interface that can be
Hi,
Can anyone point me to a documentation which explains how to submit a
project to Hadoop as a subproject? Also, I will appreciate if someone points
me to the documentation on how to submit a project as Apache project.
We have a project that is built on Hadoop. It is released to the open
The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
following in it:
2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = node19/127.0.0.1
The log file runs into thousands of line with the same message being
displayed every time.
On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra mnage...@asu.edu wrote:
The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
following in it:
2009-04-14 10:08:11,499 INFO
Looks like your NameNode is down .
Verify if hadoop process are running ( jps should show you all java running
process).
If your hadoop process are running try restarting your hadoop process .
I guess this problem is due to your fsimage not being correct .
You might have to format your
This is how things get into Apache Incubator: http://incubator.apache.org/
But the rules are, I believe, that you can skip the incubator and go straight
under a project's wing (e.g. Hadoop) if the project PMC approves.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
-
That certainly works, though if you plan to upgrade the underlying library,
you'll find that copying files with the correct versions into
$HADOOP_HOME/lib rapidly gets tedious, and subtle mistakes (e.g., forgetting
one machine) can lead to frustration.
When you consider the fact that you're using
Hi,
I wrote a blog post a while back about connecting nodes via a gateway. See
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
This assumes that the client is outside the gateway and all
datanodes/namenode are inside, but the same principles apply. You'll
Tarandeep,
You might want to start by releasing your project as a contrib module for
Hadoop. The overhead there is much easier -- just get it compiliing in the
contrib/ directory, file a JIRA ticket on Hadoop Core, and attach your patch
:)
- Aaron
On Wed, Apr 15, 2009 at 10:29 AM, Otis
Hi Aaron
I will look into that thanks!
I spoke to the admin who overlooks the cluster. He said that the gateway
comes in to the picture only when one of the nodes communicates with a node
outside of the cluster. But in my case the communication is carried out
between the nodes which all belong to
Thanks Aaron... yeah it sounds like a much easier approach :)
On Wed, Apr 15, 2009 at 11:00 AM, Aaron Kimball aa...@cloudera.com wrote:
Tarandeep,
You might want to start by releasing your project as a contrib module for
Hadoop. The overhead there is much easier -- just get it compiliing in
I'm setting up a Hadoop cluster and I have the name node and job tracker up
and running. However, I cannot get any of my datanodes or tasktrackers to
start. Here is my hadoop-site.xml file...
code?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!-- Put site-specific
Hi,
The replication factor has to be set to 1. Also for you dfs and job tracker
configuration you should insert the name of the node rather than the i.p
address.
For instance:
value192.168.1.10:54310/value
can be:
valuemaster:54310/value
The nodes can be renamed by renaming them in the hosts
That helps a lot actually. I will try setting up my hosts file tomorrow and
make the other changes you suggested.
Thanks!
Mithila Nagendra wrote:
Hi,
The replication factor has to be set to 1. Also for you dfs and job
tracker
configuration you should insert the name of the node rather
I got it all up and working, thanks for your help - it was an issue with me
not actually setting the log.dir system property before the cluster startup.
Can't believe I missed that one :)
As a side note (which you might already be aware of), the example class
you're using in Chapter 7
This sounds like a reasonable request.
Created
https://issues.apache.org/jira/browse/HADOOP-5684
On our clusters, sometimes users want thin mappers and large reducers.
Koji
-Original Message-
From: Jun Rao [mailto:jun...@almaden.ibm.com]
Sent: Thursday, April 09, 2009 10:30 AM
To:
Data stored to /tmp has no consistency / reliability guarantees. Your OS
can delete that data at any time.
Configure hadoop-site.xml to store data elsewhere. Grep for /tmp in
hadoop-default.xml to see all the configuration options you'll have to
change. Here's the list I came up with:
Alex,
Yes, I bounced the Hadoop daemons after I changed the configuration files.
I also tried setting $HADOOP_CONF_DIR to the directory where my
hadop-site.xml file resides but it didn't work.
However, I'm sure that HADOOP_CONF_DIR is not the issue because other
properties that I changed in
Thanks
Pankil
On Wed, Apr 15, 2009 at 5:09 PM, Alex Loddengaard a...@cloudera.com wrote:
Data stored to /tmp has no consistency / reliability guarantees. Your OS
can delete that data at any time.
Configure hadoop-site.xml to store data elsewhere. Grep for /tmp in
hadoop-default.xml to
Hi,
I'm getting the following warning when running the simple wordcount and
grep examples.
09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
attempt_200904151649_0001_m_19_0, Status : FAILED
Too many fetch-failures
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
On Tue, Apr 14, 2009 at 2:35 AM, tim robertson timrobertson...@gmail.comwrote:
I am considering (for better throughput as maps generate huge request
volumes) pregenerating all my tiles (PNG) and storing them in S3 with
cloudfront. There will be billions of PNGs produced each at 1-3KB
each.
Double check that there is no firewall in place.
At one point a bunch of new machines were kickstarted and placed in a
cluster and they all failed with something similar.
It turned out the kickstart script turned enabled the firewall with a rule
that blocked ports in the 50k range.
It took us a
Hi
My problem is not that my data is under replicated. I have 3
data nodes. In my hadoop-site.xml I also set the configuration as:
property
namedfs.replication/name
value2/value
/property
But after this also data is replicated on 3 nodes instead of two nodes.
Now, please tell
Hi
My problem is not that my data is under replicated. I have 3
data nodes. In my hadoop-site.xml I also set the configuration as:
property
namedfs.replication/name
value2/value
/property
But after this also data is replicated on 3 nodes instead of two nodes.
Now, please tell
28 matches
Mail list logo