I copied a 230GB file into my hadoop cluster. After my MR job kept failing I
tracked down the error to one line of formatted text.
I copied the file back out of hdfs and when I compare it to the original file
there are about 20 bytes on one line (out of 230GB) that are different.
Is there no
Is 1.6.0_17 or 1.6.0_20 preferred as the JRE for hadoop? Thank you.
Does anyone know what might be causing this error? I am using version Hadoop
0.20.2 and it happens when I run bin/hadoop dfs -copyFromLocal ...
10/07/09 15:51:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 128.238.55.43:50010
the description about xcievers at:
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
You can confirm that you have a xcievers problem by grepping the
datanode logs with the error message pasted in the last bullet point.
On Fri, Jul 9, 2010 at 1:10 PM, Raymond Jennings III
Are there instructions on how to enable (which type?) of compression on hdfs?
Does this have to be done during installation or can it be added to a running
cluster?
Thanks,
Ray
, Jun 24, 2010 at 11:26 AM, Raymond Jennings III
raymondj...@yahoo.com wrote:
Are there instructions on how to enable (which type?) of compression on hdfs?
Does this have to be done during installation or can it be added to a
running cluster?
Thanks,
Ray
--
Eric Sammer
twitter: esammer
I recall reading sometime ago on this mailing list that certain JRE versions
were recommended and others were not. Was it 1.6.0_17 the preferred?
Thank you.
I am trying to create my partitioner but I am getting an exception. Is
anything required other than providing the method public int getPartition and
extending the Partitioner class?
java.lang.RuntimeException: java.lang.NoSuchMethodException:
TSPmrV6$TSPPartitioner.init()
at
...@gmail.com
Subject: Re: Cutom partitioner question
To: common-user@hadoop.apache.org
Date: Thursday, June 3, 2010, 2:10 PM
An empty ctor is needed for your
Partitioner class.
On Thu, Jun 3, 2010 at 10:13 AM, Raymond Jennings III
raymondj...@yahoo.com
wrote:
I am trying to create my
I have a cluster of 12 slave nodes. I see that for some jobs the part-r-0
type files, half of them are zero in size after the job completes. Does this
mean the hash function that splits the data to each reducer node is not working
all that well? On other jobs it's pretty much even across
I want to write to a common hdfs file from within my map method. Given that
each task runs in a separate jvm (on separate machines) making a method
syncronized will not work I assume. Are there any file locking or other
methods to guarantee mutual exclusion on hdfs?
(I want to append to this
I've got a dead machine on my cluster. I want to safely update HDFS so that
nothing references this machine then I want to rebuild it and put it back in
service in the cluster.
Does anyone have any pointers how to do this (the first part - updating HDFS so
that it's no longer referenced.)
Isn't the number of mappers specified only a suggestion ?
--- On Thu, 4/22/10, He Chen airb...@gmail.com wrote:
From: He Chen airb...@gmail.com
Subject: Hadoop does not follow my setting
To: common-user@hadoop.apache.org
Date: Thursday, April 22, 2010, 12:50 PM
Hi everyone
I am doing a
I am running an application that has many iterations and I find that the
JobTracker's website cuts off many of the initial runs. Is there any way to
increase the results of previous jobs such that they are still available at the
JobTracker's website? Thank you.
After running hadoop for some period of time, the command 'jps' fails to report
any hadoop process on any node in the cluster. The processes are still running
as can be seen with 'ps -ef|grep java'
In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the
processes to stop.
by
default.
# export HADOOP_PID_DIR=/var/hadoop/pids
The hadoop shell scripts look in the directory that is
defined.
Bill
-Original Message-
From: Raymond Jennings III [mailto:raymondj...@yahoo.com]
Sent: Monday, March 29, 2010 11:37 AM
To: common-user@hadoop.apache.org
that the hadoop processes are
running under?
Bill
On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III
raymondj...@yahoo.com
wrote:
After running hadoop for some period of time, the
command 'jps' fails to
report any hadoop process on any node in the
cluster. The processes are
still
I would like to try to use a ChainMapper/ChainReducer but I see that the last
parameter is a JobConf which I am not creating as I am using the latest API
version. Has anyone tried to do this with the later version API? Can I
extract a JobConf object somewhere?
Thanks
for the input to a mapper or as the output of either mapper or reducer?
Any pointers on what might be causing this? Thanks!
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1006)
at java.io.DataOutputStream.write(Unknown Source)
at org.apache.hadoop.io.Text.write(Text.java:282)
I'd like to be able to clear the contents of the jobs that have completed
running on the jobtracker webpage. Is there an easy way to do this without
restarting the cluster?
I need to pass a counter value to my reducer from the main program. Can this
be done through the context parameter somehow?
Is it possible to override a method in the reducer so that similar keys will be
grouped together? For example I want all keys of value KEY1 and KEY2 to
merged together. (My reducer has a KEY of type TEXT.) Thanks.
I thought there was a util to do the upgrade for you that you run from one node
and it would do a copy to every other node?
Are there any examples that show how to create a SEQ file in HDFS ?
I am interested in seeing how mapreduce could be used to approximate the
traveling salesman problem. Anyone have a pointer?
Thanks.
I am using the non-deprecated Mapper. Can I obtain it from the Context
somehow? Anyone have an example of this? Thanks.
In other words: I have a situation where I want to feed the output from the
first iteration of my mapreduce job to a second iteration and so on. I have a
for loop in my main method to setup the job parameters and to run it through
all iterations but on about the third run the Hadoop processes
Is there a typo in the Join.java example that comes with hadoop? It has the
line:
JobConf jobConf = new JobConf(getConf(), Sort.class);
Shouldn't that be Join.class ? Is there an equivalent example that uses the
later API instead of the deprecated calls?
I would try running the rebalance utility. I would be curious to see what that
will do and if that will fix it.
--- On Wed, 1/27/10, Ananth T. Sarathy ananth.t.sara...@gmail.com wrote:
From: Ananth T. Sarathy ananth.t.sara...@gmail.com
Subject: Need to re replicate
To:
Not sure if this solves your problem but I had a similar case where there was
unique data at the beginning of the file and if that file was split between
maps I would lose that for the 2nd and subsequent maps. I was able to pull the
file name from the conf and read the first two lines for
I am not a patent attorney either but for what it's worth - many times a patent
is sought solely to protect a company from being sued from another. So even
though Hadoop is out there it could be the case that Google has no intent of
suing anyone who uses it - they just wanted to protect
I am trying to determine what the name of the file that is being used for the
map task. I am trying to use the setup() method to read the input file with:
public void setup(Context context) {
Configuration conf = context.getConfiguration();
String inputfile =
file. All
these are done in configure method, that means, before any
map method is called.
-Gang
- 原始邮件
发件人: Raymond Jennings III raymondj...@yahoo.com
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/8 (周五) 7:54:30 下午
主 题: Re: Is it possible to share a
key across maps
is
processing. Use this path to read first line of
corresponding file. All
these are done in configure method, that means, before
any map method is
called.
-Gang
- 原始邮件
发件人: Raymond Jennings III raymondj...@yahoo.com
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/8 (周五
I tried writing to stderr but I guess that is not valid. Can someone tell me
how I can output some text during either the map or reduce methods?
I have large files where the userid is the first line of each file. I want to
use that value as the output of the map phase for each subsequent line of the
file. If each map task gets a chunk of this file only one map task will read
the key value from the first line. Is there anyway I can
I am trying to develop some hadoop programs and I see that most of the examples
included in the distribution are using deprecated classes and methods. Are
there any other sources to learn about the api other than the javadocs, which
for beginners trying to write hadoop programs, is not the
I have been recently seeing a problem where jobs stop at map 0% that previously
worked fine (with no code changes.) Restarting hadoop on the cluster solves
this problem but there is nothing in the log files to indicate what the problem
is. Has anyone seen something similar?
Does anyone have any idea what might be causing the following three errors that
I am seeing. I am not able to determine what job or what was happening at the
times listed but I am hoping that if I have a little more information I can
track down what is happening:
Does the combiner run once per data node or one per map task? (That it can run
multiple times on the same data node after each map task.) Thanks.
Do people normally combine these two processes onto one machine? Currently I
have them on separate machines but I am wondering they use that much CPU
processing time and maybe I should combine them and create another DataNode.
I have been pulling my hair out on this one. I tried building it within
eclipse - no errors, but when I put the jar file in and restart eclipse I can
see the Map/Reduce prospective but once I try to do anything it bombs with
random cryptic errors. I looked at Stephen's notes on jiva but still
The plugin that is included in the hadoop distribution under
src/contrib/eclipse-plugin - how does that get installed as it does not appear
to be in a standard plugin format. Do I have to build it first and if so can
you tell me how. Thanks. Ray
it in eclipse installation
plugins folder and restart eclipse
-dp
On Nov 20, 2009, at 2:08 PM, Raymond Jennings III raymondj...@yahoo.com
wrote:
The plugin that is included in the hadoop distribution
under src/contrib/eclipse-plugin - how does that get
installed as it does
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Friday, November 20, 2009, 9:53 PM
Yes if it's not built you can do ant
eclipse. It will geerate the plugin jar and you can paste
it in plugin directory.
-dp
On Nov 20, 2009, at 6:49 PM, Raymond Jennings III raymondj
Can I just change the block size in the config and restart or do I have to
reformat? It's okay if what is currently in the file system stays at the old
block size if that's possible ?
If I understand you correctly you can run jps and see the java jvm's running
on each machine - that should tell you if you are running in pseudo mode or not.
--- On Thu, 11/12/09, kvorion kveinst...@gmail.com wrote:
From: kvorion kveinst...@gmail.com
Subject: About Hadoop pseudo distribution
Is there a way that I can setup directories in dfs for individual users and set
the permissions such that only that user can read write such that if I do a
hadoop dfs -ls I would get /user/user1 /user/user2 etc each directory only
being able to read and write to by the respective user? I
?
To: common-user@hadoop.apache.org
Date: Wednesday, November 11, 2009, 1:59 PM
On 11/11/09 8:50 AM, Raymond Jennings III raymondj...@yahoo.com
wrote:
Is there a way that I can setup directories in dfs for
individual users and
set the permissions such that only that user can read
you have at least one
datanode running.
Look at the data node log file. (logs/*-datanode-*.log)
Boris.
On 11/9/09 7:15 AM, Raymond Jennings III raymondj...@yahoo.com
wrote:
I am trying to resolve an IOException error. I
have a basic setup and shortly
after running start-dfs.sh I get
need
to make sure
these are cleaned up before reformatting. You can do it
just by deleting
the data node directory, although there's probably a more
official way
to do it.
On 11/10/09 11:01 AM, Raymond Jennings III wrote:
On the actual datanodes I see the following
exception: I am
I am trying to resolve an IOException error. I have a basic setup and shortly
after running start-dfs.sh I get a:
error: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info
could only be replicated to 0 nodes, instead of 1
java.io.IOException: File
53 matches
Mail list logo