Hadoop-0.14 introduced job priorities (https://issues.apache.org/jira/
browse/HADOOP-1433); you might be able to get somewhere with this.
Another possibility is to create two mapreduce clusters on top of the
same dfs cluster.
The mapred.tasktracker.tasks.maximum doesn't do what you think --
In your hadoop-site.xml, you can set
property
namehadoop.tmp.dir/name
value/hadoop/value
/property
This will put all the hadoop stuff in /hadoop. By default, this directory is
/tmp/hadoop-$USER, which is probably worth a bug report.
-Michael
On 12/6/07 10:31 AM, Michael Harris
You also might want to look at HADOOP-2300
On 12/2/07 7:33 PM, Jason Venner [EMAIL PROTECTED] wrote:
We have jobs that require different resources and as such saturate our
machines at different levels or parallelization.
What we want to do in the driver is set the number of simultaneous jobs
Hi Eugeny,
I do something like this in a jetty server, which I start with java -jar
server.jar.
To monitor hadoop jobs, I simply use the JobClient class and manually set
the fs.default.name/mapred.job.tracker properties on the JobConf object
used in the JobClient constructor. Since I don't have
In order to use hadoop dfs, your client must be able to talk to all your
datanodes and the namenode.
So you should:
1. Make sure you can talk to datanodes
2. Make sure your datanode reports its public ip/dns name to the namenode, not
its internal amazon ip/dns name. You can check this on the
You can tune number of map tasks/node with the config variable
mapred.tasktracker.tasks.maximum on the jobtracker (there is a patch to make
it configurable on the tasktracker: see
https://issues.apache.org/jira/browse/HADOOP-1245).
-Michael
On 10/22/07 5:53 PM, Lance Amundsen [EMAIL
Does anybody know if there is a jdk6 available for Mac? I checked the apple
developer site, and there doesn't seem to be one available, despite blogs from
last year claiming apple was distributing it.
Since I do my development work on a Mac, switching to jdk6 would be very
difficult for me if
MySQL and hbase are optimized for different operations. What are you trying to
do?
-Michael
On 10/11/07 3:35 PM, Rafael Turk [EMAIL PROTECTED] wrote:
Hi All,
Does any one have comments about how Hbase will perform in a 4 node cluster
compared to an equivalent MySQL configuration?
Thanks,
It looks like you are treating a jobtracker as a namenode. Make sure
fs.default.name is set to a namenode's address. By default, namenodes run on
port 1, while jobtrackers run on port 10001.
-Michael
On 10/8/07 5:47 PM, Jim the Standing Bear [EMAIL PROTECTED] wrote:
Hi Khalil,
Yes, SSH
While you can proxy puts/gets to HDFS, this can dramatically decrease your
bandwidth. The hadoop dfs client is pretty good about writing to/reading
from multiple HDFS nodes simultaneously; a proxy makes this impossible.
Of course, depending on your cluster size, network connection, and data
Well, there is an (undocumented?) way to get rack-awareness in the Datanode,
so you could co-opt this to represent datacenter-awareness. I don't think
there is such a rack-awareness ability for the DFSClient or TaskTracker
though.
-Michael
On 9/6/07 3:10 PM, Torsten Curdt [EMAIL PROTECTED]
--
Torsten
On 07.09.2007, at 00:26, Michael Bieniosek wrote:
Well, there is an (undocumented?) way to get rack-awareness in the
Datanode,
so you could co-opt this to represent datacenter-awareness. I
don't think
there is such a rack-awareness ability for the DFSClient or
TaskTracker
The hadoop way of submitting patches is to create a JIRA issue for each
patch so they can be tested and discussed separately. It looks like you
have several unrelated changes in there. You'll also need to regenerate
your patches against HEAD.
It's always nice to have more contributors. I'm
,
Neeraj
-Original Message-
From: Michael Bieniosek [mailto:[EMAIL PROTECTED]
Sent: Friday, August 17, 2007 11:55 AM
To: hadoop-user@lucene.apache.org; Mahajan, Neeraj
Subject: Re: Query about number of task trackers specific to a site
https://issues.apache.org/jira/browse/HADOOP
than 500
tasks. Each task tracker executed many tasks, but at all times I could
see that 4 child processes were running on each machine.
~ Neeraj
-Original Message-
From: Michael Bieniosek [mailto:[EMAIL PROTECTED]
Sent: Friday, August 17, 2007 1:01 PM
To: Mahajan, Neeraj; hadoop
The wiki page http://wiki.apache.org/lucene-hadoop/HowToConfigure implies
that mapred-default.xml is read for the dfs configuration, as well as for
mapreduce jobs. But this doesn't appear to be true based on the code, as
the string mapred-default.xml only appears in the mapred package.
So in
On 8/2/07 5:20 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
I've found the
getMapTaskReports method in the JobClient class, but can't work out
how to access it other than by creating a new instance of JobClient -
but then that JobClient would be a differnt one to the one that was
bin/hadoop job -kill job_0001
On 6/27/07 11:11 AM, patrik [EMAIL PROTECTED] wrote:
Is there a way to kill a job that's currently running?
pb
Hi,
I just upgraded my cluster to hadoop-0.13. I now notice that the task logs
in userlogs/ and viewable through the gui often get cut off in the middle of
a task. I checked the file system, and it appears there's only a part-0
on the system. The tasktracker log doesn't seem to indicate
In hadoop-default.xml you should find:
property
namehadoop.tmp.dir/name
value/tmp/hadoop-${user.name}/value
descriptionA base for other temporary directories./description
/property
property
namedfs.name.dir/name
value${hadoop.tmp.dir}/dfs/name/value
descriptionDetermines where on the
the
/tmp/hadoop-user-name directory.
Thanks
A
-Original Message-
From: Michael Bieniosek [mailto:[EMAIL PROTECTED]
Sent: Friday, June 15, 2007 11:31 AM
To: hadoop-user@lucene.apache.org; Phantom
Subject: Re: Formatting the namenode
In hadoop-default.xml you should find
Hi,
I noticed that HADOOP-975 and HADOOP-1000 made the log4j from child vms go
to a different place than the stdout for the task. My tasks send some of
their debugging information to stdout, and some of it to log4j. I'd like
all this information to go to the same place, so that I can see the
The slaves connect to the master, not the other way around.
I don't use a slaves file at all; I just point new tasktrackers at the
jobtracker and everything just works (without restarting).
My understanding is that the slaves file, if present, merely functions as an
allow list of slaves that can
On 5/30/07 7:31 PM, Peter W. [EMAIL PROTECTED] wrote:
Unsetting JAVA_PLATFORM gives an error message:
% bin/hadoop jar hadoop-0.12.3-examples.jar pi 10 20
Exception in thread main java.lang.NoClassDefFoundError: OS
This was fixed in https://issues.apache.org/jira/browse/HADOOP-1081
a
serverSocket on 9000, started with the same user on the same machine.
And I am able to connect to it from all other machines.
So is there some settings that will cause the namenode to only bind to
the 9000 port on the local interface ?
Cedric
On 5/12/07, Michael Bieniosek [EMAIL PROTECTED
I'm not sure exactly what you're trying to do, but you can specify command
line parameters to hadoop -jar which you can interpret in your code. Your
code can then write arbitrary config parameters before starting the
mapreduce. Based on these configs, you can load specific jars in your
mapreduce
What are you trying to do? Hadoop dfs has different goals than a network
file system such as samba.
-Michael
On 4/16/07 10:32 AM, jafarim [EMAIL PROTECTED] wrote:
On linux and jvm6 with normal IDE disks and a giga ethernet switch with
corresponding NIC and with hadoop 0.9.11's HDFS. We wrote
Hi,
When I try to scale Hadoop up to about 100 nodes on EC2 (single-cpu Xen), I
notice things start to fall apart. For example, the jobtracker starts
dropping requests with the message Call queue overflow discarding oldest
call. I've also seen problems with the namenode where dfs requests fail
?
-Michael
On 3/29/07 1:37 PM, Doug Cutting [EMAIL PROTECTED] wrote:
Michael Bieniosek wrote:
When I try to scale Hadoop up to about 100 nodes on EC2 (single-cpu Xen), I
notice things start to fall apart. For example, the jobtracker starts
dropping requests with the message Call queue overflow
29 matches
Mail list logo