I think that the problem that I am remembering was due to poor recovery from
this problem. The underlying fault is likely due to poor connectivity
between your machines. Test that all members of your cluster can access all
others on all ports used by hadoop.
See here for hints:
On Thu, Aug 20, 2009 at 6:49 AM, Harish Mallipeddi
harish.mallipe...@gmail.com wrote:
On Thu, Aug 20, 2009 at 7:25 AM, roman kolcun roman.w...@gmail.com
wrote:
Hello everyone,
could anyone please tell me in which class and which method does Hadoop
download the file chunk from HDFS and
On Thu, Aug 20, 2009 at 2:39 PM, roman kolcun roman.w...@gmail.com wrote:
Hello Harish,
I know that TaskTracker creates separate threads (up to
mapred.tasktracker.map.tasks.maximum) which execute the map() function.
However, I haven't found the piece of code which associate FileSplit with
Hi folks,
Sorry to cut across this discussion but I'm experiencing some similar
confusion about where to change some parameters.
In particular, I'm not entirely clear on how the following should be
used - clarification welcome (I'm happy to pull some of this together on
a blog once I get
On Thu, Aug 20, 2009 at 10:30 AM, Harish Mallipeddi
harish.mallipe...@gmail.com wrote:
On Thu, Aug 20, 2009 at 2:39 PM, roman kolcun roman.w...@gmail.com
wrote:
Hello Harish,
I know that TaskTracker creates separate threads (up to
mapred.tasktracker.map.tasks.maximum) which execute
Hi Roman,
Have a look at CombineFileInputFormat - it might be related to what
you are trying to do.
Cheers,
Tom
On Thu, Aug 20, 2009 at 10:59 AM, roman kolcunroman.w...@gmail.com wrote:
On Thu, Aug 20, 2009 at 10:30 AM, Harish Mallipeddi
harish.mallipe...@gmail.com wrote:
On Thu, Aug 20,
Thanks Tom,
I will have a look at it.
Cheers,
Roman
On Thu, Aug 20, 2009 at 3:02 PM, Tom White t...@cloudera.com wrote:
Hi Roman,
Have a look at CombineFileInputFormat - it might be related to what
you are trying to do.
Cheers,
Tom
On Thu, Aug 20, 2009 at 10:59 AM, roman
On Wed, Aug 19, 2009 at 11:50 PM, Brian Bockelmanbbock...@cse.unl.edu wrote:
Hey Mike,
Yup. We find the stock log4j needs two things:
1) Set the rootLogger manually. The way 0.19.x has the root logger set up
breaks when adding new appenders. I.e., do:
Yeah, that is interesting Edward. I don't need syslog-ng for any particular
reason, other than that I'm familiar with it. If there were another way to
get all my logs collated into one log file that would be great.
mike
On Thu, Aug 20, 2009 at 10:44 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Thu, Aug 20, 2009 at 10:49 AM, mike andersonsaidthero...@gmail.com wrote:
Yeah, that is interesting Edward. I don't need syslog-ng for any particular
reason, other than that I'm familiar with it. If there were another way to
get all my logs collated into one log file that would be great.
Hi all,
Can anyone tell me how the MR scheduler schedule the MR jobs?
How does it decide where t create MAP tasks and how many to create.
Once the MAP tasks are over how does it decide to move the keys to the
reducer efficiently(minimizing the data movement across the network).
Is there any doc
Hi all,
Can anyone tell me how the MR scheduler schedule the MR jobs?
How does it decide where t create MAP tasks and how many to create.
Once the MAP tasks are over how does it decide to move the keys to the
reducer efficiently(minimizing the data movement across the network).
Is there any doc
Hi
When I try to execute *hadoop-ec2 launch-cluster test-cluster 2*, it
executes, but keep waiting at Waiting for instance to start, find below
the exact display as it shows on my screen
$ bin/hadoop-ec2 launch-cluster test-cluster 2
Testing for existing master in group: test-cluster
Creating
If it always takes a very long time to start transferring data, get a few
stack dumps (jstack or kill -e) during this period to see what it is doing
during this time.
Most likely, the client is doing nothing but waiting on the remote side.
On 8/20/09 8:02 AM, Ananth T. Sarathy
On Wednesday, August 19, 2009 11:21
Jakob Homan wrote:
George-
You can certainly submit jobs asynchronously via the
JobClient.submitJob() method
(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobClient.html).
This will return a handle (a
Probably unrelated to your problem, but one extreme case I've seen,
a user's job with large gzip inputs (non-splittable),
20 mappers 800 reducers. Each map outputted like 20G.
Too many reducers were hitting a single node as soon as a mapper finished.
I think we tried something like
On 8/20/09 9:48 AM, Ananth T. Sarathy ananth.t.sara...@gmail.com wrote:
ok.. i seems that's the case. that seems kind of selfdefeating though.
Ananth T Sarathy
Then something is wrong with S3. It may be misconfigured, or just poor
performance. I have no experience with S3 but 20 seconds
Hi Stefan,
I am sorry, for the late reply. Somehow the response email has slipped my
eyes.
Could you explain a bit on how to use Hadoop streaming with binary data
formats.
I can see, explanations on using it with text data formats, but not for
binary files.
Thank you,
Jaliya
Stefan
Uhh hadoop already goes to considerable lengths to make sure that
computation is local. In my experience it is common for 90% of the map
invocations to be working from local data. Hadoop doesn't know about record
boundaries so a little bit of slop into a non-local block is possible to
finish
Suresh had made an spreadsheet for memory consumption.. will check.
A large portion of NN memory is taken by references. I would expect
memory savings to be very substantial (same as going from 64bit to
32bit), could be on the order of 40%.
The last I heard from Sun was that compressed
If you go to
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/AllocationConfigurationException.java?view=log
it
shows many revisions for the source
file AllocationConfigurationException.java, so I was wondering which can be
used to
Hello Ted,
I know that Hadoop tries to exploit data locality and it is pretty high.
However, the data locality cannot be exploited in case when
'mapred.min.split.size' is set to much higher than DFS blocksize - because
consecutive blocks are not stored on a single machine.
I have found out that
On Aug 20, 2009, at 9:00 AM, bharath vissapragada wrote:
Hi all,
Can anyone tell me how the MR scheduler schedule the MR jobs?
How does it decide where t create MAP tasks and how many to create.
Once the MAP tasks are over how does it decide to move the keys to the
reducer
Mithila ,
It depends on which version of Hadoop you want to work on .
If you want to work on Hadoop 0.20 then you should check out Hadoop 0.20 source
code .
If you want to work on trunk then check out Hadoop mapreduce source .
svn checkout
I got it working! fantastic. One thing that hung me up for a while was how
picky the log4j.properties files are about syntax. For future reference to
others, I used this in log4j.properties:
# Define the root logger to the system property hadoop.root.logger.
log4j.rootLogger=${hadoop.root.logger},
Look into typed bytes:
http://dumbotics.com/2009/02/24/hadoop-1722-and-typed-bytes/
On Thu, Aug 20, 2009 at 10:29 AM, Jaliya Ekanayake jnekanay...@gmail.comwrote:
Hi Stefan,
I am sorry, for the late reply. Somehow the response email has slipped my
eyes.
Could you explain a bit on how to
Compressed OOPs are available now in 1.6.0u14:
https://jdk6.dev.java.net/6uNea.html
- Aaron
On Thu, Aug 20, 2009 at 10:51 AM, Raghu Angadi rang...@yahoo-inc.comwrote:
Suresh had made an spreadsheet for memory consumption.. will check.
A large portion of NN memory is taken by references. I
On 8/20/09 3:40 AM, Steve Loughran ste...@apache.org wrote:
does anyone have any up to date data on the memory consumption per
block/file on the NN on a 64-bit JVM with compressed pointers?
The best documentation on consumption is
http://issues.apache.org/jira/browse/HADOOP-1687 -I'm
Is there a way to find out how much disk space - overall or per Datanode basis
- is available before creating a file ?
I am trying to address an issue where the disk got full (config error) and the
client was not able to create a file on the HDFS.
I want to be able to check if there space
Using hadoop-0.19.2
From: Arvind Sharma arvind...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Thursday, August 20, 2009 3:56:53 PM
Subject: Cluster Disk Usage
Is there a way to find out how much disk space - overall or per Datanode basis
- is available
Sorry, I also sent a direct e-mail to one response
there I asked one question - what is the cost of these APIs ??? Are they too
expensive calls ? Is the API only going to the NN which stores this data ?
Thanks!
Arvind
From: Arvind Sharma
Hi,
I am trying to run a simple map reduce that writes the result from the
reducer to a mysql db.
I Keep getting
09/08/20 15:44:59 INFO mapred.JobClient: Task Id :
attempt_200908201210_0013_r_00_0, Status : FAILED
java.io.IOException: com.mysql.jdbc.Driver
at
You can use the jobtracker Web UI to use the disk usage.
-Original Message-
From: Arvind Sharma [mailto:arvind...@yahoo.com]
Sent: 2009年8月20日 15:57
To: common-user@hadoop.apache.org
Subject: Cluster Disk Usage
Is there a way to find out how much disk space - overall or per Datanode
Add some detials:
1. #map is determined by the block size and InputFormat (whether you can
want to split or not split)
2. The default scheduler for Hadoop is FIFO, and the Fair Scheduler and
Capacity Scheduler are other two options as I know. JobTracker has the
scheduler.
3. Once the map task
Arvind,
You can use this API to get the size of file system used
FileSystem.getUsed();
But, I do not find the API for calculate the remaining space. You can write
some code to create a API,
The remaining disk space = Total of disk space - operate system space -
FileSystem.getUsed()
OK i'll be a bit more specific ,
Suppose map outputs 100 different keys .
Consider a key K whose correspoding values may be on N diff datanodes.
Consider a datanode D which have maximum number of values . So instead of
moving the values on D
to other systems , it is useful to bring in the values
Thanks for the quick reply.
I looked at it, but still could not figure out how to use HDFS to store
input data (binary) and call an executable.
Please note that I cannot modify the executable.
May be I am asking some dumb question, but could you please explain a bit of
how to handle the scenario
Hello,
I got these exceptions when I started the cluster, any suggestions?
I used hadoop 0.15.2.
2009-08-21 12:12:53,463 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
Hi,
GenericOptionsParser is customized only for Hadoop specific params :
* codeGenericOptionsParser/code recognizes several standarad command
* line arguments, enabling applications to easily specify a namenode, a
* jobtracker, additional configuration resources etc.
Ideally, all params
39 matches
Mail list logo