Is hdfs reliable? Very odd error

2010-08-13 Thread Raymond Jennings III
I copied a 230GB file into my hadoop cluster. After my MR job kept failing I tracked down the error to one line of formatted text. I copied the file back out of hdfs and when I compare it to the original file there are about 20 bytes on one line (out of 230GB) that are different. Is there no

Preferred Java version

2010-07-16 Thread Raymond Jennings III
Is 1.6.0_17 or 1.6.0_20 preferred as the JRE for hadoop? Thank you.

Help with Hadoop runtime error

2010-07-09 Thread Raymond Jennings III
Does anyone know what might be causing this error? I am using version Hadoop 0.20.2 and it happens when I run bin/hadoop dfs -copyFromLocal ... 10/07/09 15:51:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 128.238.55.43:50010

Re: Help with Hadoop runtime error

2010-07-09 Thread Raymond Jennings III
the description about xcievers at: http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements You can confirm that you have a xcievers problem by grepping the datanode logs with the error message pasted in the last bullet point. On Fri, Jul 9, 2010 at 1:10 PM, Raymond Jennings III

Newbie to HDFS compression

2010-06-24 Thread Raymond Jennings III
Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster? Thanks, Ray

Re: Newbie to HDFS compression

2010-06-24 Thread Raymond Jennings III
, Jun 24, 2010 at 11:26 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Are there instructions on how to enable (which type?) of compression on hdfs? Does this have to be done during installation or can it be added to a running cluster? Thanks, Ray -- Eric Sammer twitter: esammer

Which version of java is the preferred version?

2010-06-18 Thread Raymond Jennings III
I recall reading sometime ago on this mailing list that certain JRE versions were recommended and others were not. Was it 1.6.0_17 the preferred? Thank you.

Cutom partitioner question

2010-06-03 Thread Raymond Jennings III
I am trying to create my partitioner but I am getting an exception. Is anything required other than providing the method public int getPartition and extending the Partitioner class? java.lang.RuntimeException: java.lang.NoSuchMethodException: TSPmrV6$TSPPartitioner.init() at

Re: Cutom partitioner question

2010-06-03 Thread Raymond Jennings III
...@gmail.com Subject: Re: Cutom partitioner question To: common-user@hadoop.apache.org Date: Thursday, June 3, 2010, 2:10 PM An empty ctor is needed for your Partitioner class. On Thu, Jun 3, 2010 at 10:13 AM, Raymond Jennings III raymondj...@yahoo.com wrote: I am trying to create my

Getting zero length files on the reduce output.

2010-06-02 Thread Raymond Jennings III
I have a cluster of 12 slave nodes. I see that for some jobs the part-r-0 type files, half of them are zero in size after the job completes. Does this mean the hash function that splits the data to each reducer node is not working all that well? On other jobs it's pretty much even across

How can I syncronize writing to an hdfs file

2010-05-07 Thread Raymond Jennings III
I want to write to a common hdfs file from within my map method. Given that each task runs in a separate jvm (on separate machines) making a method syncronized will not work I assume. Are there any file locking or other methods to guarantee mutual exclusion on hdfs? (I want to append to this

Decomishining a node

2010-04-23 Thread Raymond Jennings III
I've got a dead machine on my cluster. I want to safely update HDFS so that nothing references this machine then I want to rebuild it and put it back in service in the cluster. Does anyone have any pointers how to do this (the first part - updating HDFS so that it's no longer referenced.)

Re: Hadoop does not follow my setting

2010-04-22 Thread Raymond Jennings III
Isn't the number of mappers specified only a suggestion ? --- On Thu, 4/22/10, He Chen airb...@gmail.com wrote: From: He Chen airb...@gmail.com Subject: Hadoop does not follow my setting To: common-user@hadoop.apache.org Date: Thursday, April 22, 2010, 12:50 PM Hi everyone I am doing a

JobTracker website data - can it be increased?

2010-04-02 Thread Raymond Jennings III
I am running an application that has many iterations and I find that the JobTracker's website cuts off many of the initial runs. Is there any way to increase the results of previous jobs such that they are still available at the JobTracker's website? Thank you.

why does 'jps' lose track of hadoop processes ?

2010-03-29 Thread Raymond Jennings III
After running hadoop for some period of time, the command 'jps' fails to report any hadoop process on any node in the cluster. The processes are still running as can be seen with 'ps -ef|grep java' In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the processes to stop.

RE: why does 'jps' lose track of hadoop processes ?

2010-03-29 Thread Raymond Jennings III
by default. # export HADOOP_PID_DIR=/var/hadoop/pids The hadoop shell scripts look in the directory that is defined. Bill -Original Message- From: Raymond Jennings III [mailto:raymondj...@yahoo.com] Sent: Monday, March 29, 2010 11:37 AM To: common-user@hadoop.apache.org

Re: why does 'jps' lose track of hadoop processes ?

2010-03-29 Thread Raymond Jennings III
that the hadoop processes are running under? Bill On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III raymondj...@yahoo.com wrote: After running hadoop for some period of time, the command 'jps' fails to report any hadoop process on any node in the cluster.  The processes are still

Question about ChainMapper

2010-03-29 Thread Raymond Jennings III
I would like to try to use a ChainMapper/ChainReducer but I see that the last parameter is a JobConf which I am not creating as I am using the latest API version. Has anyone tried to do this with the later version API? Can I extract a JobConf object somewhere? Thanks

Is there a size limit on a line for a text file?

2010-03-25 Thread Raymond Jennings III
for the input to a mapper or as the output of either mapper or reducer?

java.io.IOException: Spill failed

2010-03-25 Thread Raymond Jennings III
Any pointers on what might be causing this? Thanks! java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1006) at java.io.DataOutputStream.write(Unknown Source) at org.apache.hadoop.io.Text.write(Text.java:282)

Is there an easy way to clear old jobs from the jobtracker webpage?

2010-03-17 Thread Raymond Jennings III
I'd like to be able to clear the contents of the jobs that have completed running on the jobtracker webpage. Is there an easy way to do this without restarting the cluster?

Can I pass a user value to my reducer?

2010-03-15 Thread Raymond Jennings III
I need to pass a counter value to my reducer from the main program. Can this be done through the context parameter somehow?

I want to group similar keys in the reducer.

2010-03-15 Thread Raymond Jennings III
Is it possible to override a method in the reducer so that similar keys will be grouped together? For example I want all keys of value KEY1 and KEY2 to merged together. (My reducer has a KEY of type TEXT.) Thanks.

How do I upgrade my hadoop cluster using hadoop?

2010-03-11 Thread Raymond Jennings III
I thought there was a util to do the upgrade for you that you run from one node and it would do a copy to every other node?

SEQ

2010-03-09 Thread Raymond Jennings III
Are there any examples that show how to create a SEQ file in HDFS ?

Anyone use MapReduce for TSP approximations?

2010-03-02 Thread Raymond Jennings III
I am interested in seeing how mapreduce could be used to approximate the traveling salesman problem. Anyone have a pointer? Thanks.

How do I get access to the Reporter within Mapper?

2010-02-24 Thread Raymond Jennings III
I am using the non-deprecated Mapper. Can I obtain it from the Context somehow? Anyone have an example of this? Thanks.

Is it possible to run multiple mapreduce jobs from within the same application

2010-02-23 Thread Raymond Jennings III
In other words: I have a situation where I want to feed the output from the first iteration of my mapreduce job to a second iteration and so on. I have a for loop in my main method to setup the job parameters and to run it through all iterations but on about the third run the Hadoop processes

Question about Join.java example

2010-02-17 Thread Raymond Jennings III
Is there a typo in the Join.java example that comes with hadoop? It has the line: JobConf jobConf = new JobConf(getConf(), Sort.class); Shouldn't that be Join.class ? Is there an equivalent example that uses the later API instead of the deprecated calls?

Re: Need to re replicate

2010-01-27 Thread Raymond Jennings III
I would try running the rebalance utility. I would be curious to see what that will do and if that will fix it. --- On Wed, 1/27/10, Ananth T. Sarathy ananth.t.sara...@gmail.com wrote: From: Ananth T. Sarathy ananth.t.sara...@gmail.com Subject: Need to re replicate To:

Re: Passing whole text file to a single map

2010-01-23 Thread Raymond Jennings III
Not sure if this solves your problem but I had a similar case where there was unique data at the beginning of the file and if that file was split between maps I would lose that for the 2nd and subsequent maps. I was able to pull the file name from the conf and read the first two lines for

Re: Google has obtained the patent over mapreduce

2010-01-20 Thread Raymond Jennings III
I am not a patent attorney either but for what it's worth - many times a patent is sought solely to protect a company from being sued from another. So even though Hadoop is out there it could be the case that Google has no intent of suing anyone who uses it - they just wanted to protect

Obtaining name of file in map task

2010-01-12 Thread Raymond Jennings III
I am trying to determine what the name of the file that is being used for the map task. I am trying to use the setup() method to read the input file with: public void setup(Context context) { Configuration conf = context.getConfiguration(); String inputfile =

Re: Is it possible to share a key across maps?

2010-01-12 Thread Raymond Jennings III
file. All these are done in configure method, that means, before any map method is called. -Gang - 原始邮件 发件人: Raymond Jennings III raymondj...@yahoo.com 收件人: common-user@hadoop.apache.org 发送日期: 2010/1/8 (周五) 7:54:30 下午 主   题: Re: Is it possible to share a key across maps

Re: Is it possible to share a key across maps?

2010-01-11 Thread Raymond Jennings III
is processing. Use this path to read first line of corresponding file. All these are done in configure method, that means, before any map method is called. -Gang - 原始邮件 发件人: Raymond Jennings III raymondj...@yahoo.com 收件人: common-user@hadoop.apache.org 发送日期: 2010/1/8 (周五

Can map reduce methods print to console in eclipse?

2010-01-11 Thread Raymond Jennings III
I tried writing to stderr but I guess that is not valid. Can someone tell me how I can output some text during either the map or reduce methods?

Is it possible to share a key across maps?

2010-01-08 Thread Raymond Jennings III
I have large files where the userid is the first line of each file. I want to use that value as the output of the map phase for each subsequent line of the file. If each map task gets a chunk of this file only one map task will read the key value from the first line. Is there anyway I can

Other sources for hadoop api help

2010-01-07 Thread Raymond Jennings III
I am trying to develop some hadoop programs and I see that most of the examples included in the distribution are using deprecated classes and methods. Are there any other sources to learn about the api other than the javadocs, which for beginners trying to write hadoop programs, is not the

Jobs stop at 0%

2009-12-24 Thread Raymond Jennings III
I have been recently seeing a problem where jobs stop at map 0% that previously worked fine (with no code changes.) Restarting hadoop on the cluster solves this problem but there is nothing in the log files to indicate what the problem is. Has anyone seen something similar?

Errors seen on the jobtracker node

2009-12-18 Thread Raymond Jennings III
Does anyone have any idea what might be causing the following three errors that I am seeing. I am not able to determine what job or what was happening at the times listed but I am hoping that if I have a little more information I can track down what is happening:

Combiner phase question

2009-12-04 Thread Raymond Jennings III
Does the combiner run once per data node or one per map task? (That it can run multiple times on the same data node after each map task.) Thanks.

Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Raymond Jennings III
Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.

Has anyone gotten the Hadoop eclipse plugin to work on Windows?

2009-11-21 Thread Raymond Jennings III
I have been pulling my hair out on this one. I tried building it within eclipse - no errors, but when I put the jar file in and restart eclipse I can see the Map/Reduce prospective but once I try to do anything it bombs with random cryptic errors. I looked at Stephen's notes on jiva but still

build / install hadoop plugin question

2009-11-20 Thread Raymond Jennings III
The plugin that is included in the hadoop distribution under src/contrib/eclipse-plugin - how does that get installed as it does not appear to be in a standard plugin format. Do I have to build it first and if so can you tell me how. Thanks. Ray

Re: build / install hadoop plugin question

2009-11-20 Thread Raymond Jennings III
it in eclipse installation plugins folder and restart eclipse -dp On Nov 20, 2009, at 2:08 PM, Raymond Jennings III raymondj...@yahoo.com wrote: The plugin that is included in the hadoop distribution under src/contrib/eclipse-plugin - how does that get installed as it does

Re: build / install hadoop plugin question

2009-11-20 Thread Raymond Jennings III
To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Friday, November 20, 2009, 9:53 PM Yes if it's not built you can do ant eclipse. It will geerate the plugin jar and you can paste it in plugin directory. -dp On Nov 20, 2009, at 6:49 PM, Raymond Jennings III raymondj

Can I change the block size and then restart?

2009-11-19 Thread Raymond Jennings III
Can I just change the block size in the config and restart or do I have to reformat? It's okay if what is currently in the file system stays at the old block size if that's possible ?

Re: About Hadoop pseudo distribution

2009-11-12 Thread Raymond Jennings III
If I understand you correctly you can run jps and see the java jvm's running on each machine - that should tell you if you are running in pseudo mode or not. --- On Thu, 11/12/09, kvorion kveinst...@gmail.com wrote: From: kvorion kveinst...@gmail.com Subject: About Hadoop pseudo distribution

User permissions on dfs ?

2009-11-11 Thread Raymond Jennings III
Is there a way that I can setup directories in dfs for individual users and set the permissions such that only that user can read write such that if I do a hadoop dfs -ls I would get /user/user1 /user/user2 etc each directory only being able to read and write to by the respective user? I

Re: User permissions on dfs ?

2009-11-11 Thread Raymond Jennings III
? To: common-user@hadoop.apache.org Date: Wednesday, November 11, 2009, 1:59 PM On 11/11/09 8:50 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Is there a way that I can setup directories in dfs for individual users and set the permissions such that only that user can read

Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
you have at least one datanode running. Look at the data node log file. (logs/*-datanode-*.log) Boris. On 11/9/09 7:15 AM, Raymond Jennings III raymondj...@yahoo.com wrote: I am trying to resolve an IOException error.  I have a basic setup and shortly after running start-dfs.sh I get

Re: Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
need to make sure these are cleaned up before reformatting. You can do it just by deleting the data node directory, although there's probably a more official way to do it. On 11/10/09 11:01 AM, Raymond Jennings III wrote: On the actual datanodes I see the following exception:  I am

newbie question - error with replication

2009-11-09 Thread Raymond Jennings III
I am trying to resolve an IOException error. I have a basic setup and shortly after running start-dfs.sh I get a: error: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 java.io.IOException: File