It seems like you are hitting the problem discussed in HADOOP-1374.
We still don't know what causes it. It would be very good if you share your experience in this issue. Do you use hadoop scripts to run your tests? Which platform are you running them on?

In my experience sometimes it helps if you change the order in which the slaves start. If that does not help try to kill the task-tracker that is stuck and restart it later. The reduce task will be rescheduled to the other task-tracker and will succeed.

Thanks,
Konstantin

Samuel LEMOINE wrote:

Hi everyone !

I'm still trying to understand the way hadoop works, and the possibilities offered in parallelizing java applications with haddop (especially lucene-based ones). For the moment, I've focused my efforts on the examples given (Grep and WordCount). I've managed to make both of them work with a hadoop running as "file:///" namenode. When I try it on dfs monde (namenode configured to my local ip), Grep still works, but WordCount doesn't anymore. I manage to make Grep work with 1 computer for namenode and another for 1 datanode (obviously, WordCount still doesn't work).

But as soon as I try to built some kind of mini "real" cluster, with a namenode and n datanodes, with n=2, the map/reduce blocks. It doesn't produce any error, and the logs files don't show anything strange, the execution just freezes a little time after the beginning of the reduce task. I've noticed that the reduce task begins before the end of the map task when 2 slaves are availables, which is not the case with only 1 slave. I know that it's the expected behaviour, but I suspect it to be the cause of the freeze. The execution is doesn't follow exactly the same way each time, sometimes it blocks on 91%map/31%reduce, other times 89%map/15%reduce, but it's always at the beginning of the reduce task. I've tryed to let it run for a whole night in case it was just veeeeryyyy long, but it didn't go any further. The 2 slaves are quite identical, and both of them work when alone as slave.

I'm blocked at this point for about 5 days, hanging around with no track to explore anymore. Any help will be greatly appreciated, any clue or so... if someone knows a good way for monitoring the distant java-tasks launched on the distant jobtrackers/tasktrackers, it's also welcome :)

Oh, a little precision that shouldn't change much things: the 2 slaves are virtual machines, with 256MB of RAM each. All access to them are through ssh (which is quite boring when you change the config files a lot of times to try the different possibilities :D )


Samuel


PS: I promise to share my hadoop experience as soon as it takes a consistent shape. I anticipate rediging a tutorial for the company where i'm doing my internship, and then translate it in english for the wiki.
PS2: here follow the console messages for a few of my attempts;

console messages while running Grep on a 1master-2slaves dfs architecture:

/opt/java/bin/java -Didea.launcher.port=7540 -Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 -classpath /opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.lingway.hadoopScratchPad.Grep /user/hadoop/documents /user/hadoop/results blabla 07/08/01 11:14:01 INFO mapred.FileInputFormat: Total input paths to process : 19
07/08/01 11:14:03 INFO mapred.JobClient: Running job: job_0001
07/08/01 11:14:04 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 11:14:14 INFO mapred.JobClient:  map 10% reduce 0%
07/08/01 11:14:15 INFO mapred.JobClient:  map 21% reduce 0%
07/08/01 11:14:16 INFO mapred.JobClient:  map 26% reduce 0%
07/08/01 11:14:17 INFO mapred.JobClient:  map 31% reduce 0%
07/08/01 11:14:18 INFO mapred.JobClient:  map 42% reduce 0%
07/08/01 11:14:19 INFO mapred.JobClient:  map 47% reduce 0%
07/08/01 11:14:21 INFO mapred.JobClient:  map 57% reduce 0%
07/08/01 11:14:22 INFO mapred.JobClient:  map 63% reduce 0%
07/08/01 11:14:23 INFO mapred.JobClient:  map 73% reduce 0%
07/08/01 11:14:24 INFO mapred.JobClient:  map 78% reduce 0%
07/08/01 11:14:25 INFO mapred.JobClient:  map 84% reduce 0%
07/08/01 11:14:26 INFO mapred.JobClient:  map 89% reduce 0%
07/08/01 11:14:34 INFO mapred.JobClient:  map 89% reduce 15%

//and then doesn't go any further




********************************************************************************************

console messages while running WordCount on a single-node dfs architecture:

/opt/java/bin/java -Didea.launcher.port=7537 -Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 -classpath /opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.lingway.hadoopScratchPad.WordCount /user/hadoop/documents /user/hadoop/resultats 07/08/01 10:47:41 INFO mapred.FileInputFormat: Total input paths to process : 19
07/08/01 10:47:42 INFO mapred.JobClient: Running job: job_0004
07/08/01 10:47:43 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 10:47:48 INFO mapred.JobClient: Task Id : task_0004_m_000000_0, Status : FAILED 07/08/01 10:47:52 INFO mapred.JobClient: Task Id : task_0004_m_000001_0, Status : FAILED 07/08/01 10:47:57 INFO mapred.JobClient: Task Id : task_0004_m_000002_0, Status : FAILED 07/08/01 10:48:01 INFO mapred.JobClient: Task Id : task_0004_m_000003_0, Status : FAILED 07/08/01 10:48:05 INFO mapred.JobClient: Task Id : task_0004_m_000004_0, Status : FAILED 07/08/01 10:48:10 INFO mapred.JobClient: Task Id : task_0004_m_000005_0, Status : FAILED 07/08/01 10:48:14 INFO mapred.JobClient: Task Id : task_0004_m_000006_0, Status : FAILED 07/08/01 10:48:18 INFO mapred.JobClient: Task Id : task_0004_m_000007_0, Status : FAILED 07/08/01 10:48:24 INFO mapred.JobClient: Task Id : task_0004_m_000008_0, Status : FAILED 07/08/01 10:48:28 INFO mapred.JobClient: Task Id : task_0004_m_000009_0, Status : FAILED 07/08/01 10:48:32 INFO mapred.JobClient: Task Id : task_0004_m_000010_0, Status : FAILED 07/08/01 10:48:36 INFO mapred.JobClient: Task Id : task_0004_m_000011_0, Status : FAILED 07/08/01 10:48:41 INFO mapred.JobClient: Task Id : task_0004_m_000012_0, Status : FAILED 07/08/01 10:48:45 INFO mapred.JobClient: Task Id : task_0004_m_000013_0, Status : FAILED 07/08/01 10:48:49 INFO mapred.JobClient: Task Id : task_0004_m_000014_0, Status : FAILED 07/08/01 10:48:54 INFO mapred.JobClient: Task Id : task_0004_m_000015_0, Status : FAILED 07/08/01 10:48:58 INFO mapred.JobClient: Task Id : task_0004_m_000016_0, Status : FAILED 07/08/01 10:49:02 INFO mapred.JobClient: Task Id : task_0004_m_000017_0, Status : FAILED 07/08/01 10:49:06 INFO mapred.JobClient: Task Id : task_0004_m_000018_0, Status : FAILED 07/08/01 10:49:11 INFO mapred.JobClient: Task Id : task_0004_m_000000_1, Status : FAILED 07/08/01 10:49:15 INFO mapred.JobClient: Task Id : task_0004_m_000000_2, Status : FAILED
07/08/01 10:49:19 INFO mapred.JobClient:  map 100% reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
   at com.lingway.hadoopScratchPad.WordCount.main(WordCount.java:145)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1








********************************************************************************************


console messages while running WordCount on a 1master-1slave dfs architecture:

/opt/java/bin/java -Didea.launcher.port=7539 -Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 -classpath /opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.lingway.hadoopScratchPad.WordCount /user/hadoop/documents /user/hadoop/resultats 07/08/01 11:08:24 INFO mapred.FileInputFormat: Total input paths to process : 19
07/08/01 11:08:26 INFO mapred.JobClient: Running job: job_0001
07/08/01 11:08:27 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 11:08:37 INFO mapred.JobClient: Task Id : task_0001_m_000002_0, Status : FAILED 07/08/01 11:08:37 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:37 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:37 INFO mapred.JobClient: Task Id : task_0001_m_000000_0, Status : FAILED 07/08/01 11:08:37 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:37 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:41 INFO mapred.JobClient: Task Id : task_0001_m_000005_0, Status : FAILED 07/08/01 11:08:41 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:41 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:42 INFO mapred.JobClient: Task Id : task_0001_m_000004_0, Status : FAILED 07/08/01 11:08:42 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:42 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:46 INFO mapred.JobClient: Task Id : task_0001_m_000006_0, Status : FAILED 07/08/01 11:08:46 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:46 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:47 INFO mapred.JobClient: Task Id : task_0001_m_000001_0, Status : FAILED 07/08/01 11:08:47 INFO mapred.JobClient: Task Id : task_0001_m_000007_0, Status : FAILED 07/08/01 11:08:47 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:48 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:48 INFO mapred.JobClient: Task Id : task_0001_m_000003_0, Status : FAILED 07/08/01 11:08:52 INFO mapred.JobClient: Task Id : task_0001_m_000008_0, Status : FAILED 07/08/01 11:08:52 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:52 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:52 INFO mapred.JobClient: Task Id : task_0001_m_000009_0, Status : FAILED 07/08/01 11:08:52 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:52 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_m_000010_0, Status : FAILED 07/08/01 11:08:57 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:57 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_r_000000_0, Status : FAILED 07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_m_000011_0, Status : FAILED 07/08/01 11:08:57 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:08:57 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000012_0, Status : FAILED 07/08/01 11:09:02 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:02 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000013_0, Status : FAILED 07/08/01 11:09:02 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:02 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000002_1, Status : FAILED 07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000000_1, Status : FAILED 07/08/01 11:09:07 INFO mapred.JobClient: Task Id : task_0001_m_000014_0, Status : FAILED 07/08/01 11:09:07 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:07 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:07 INFO mapred.JobClient: Task Id : task_0001_m_000015_0, Status : FAILED 07/08/01 11:09:07 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:07 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:11 INFO mapred.JobClient: Task Id : task_0001_m_000016_0, Status : FAILED 07/08/01 11:09:11 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:11 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:12 INFO mapred.JobClient: Task Id : task_0001_m_000017_0, Status : FAILED 07/08/01 11:09:12 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:12 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:16 INFO mapred.JobClient: Task Id : task_0001_m_000018_0, Status : FAILED 07/08/01 11:09:16 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:16 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:16 INFO mapred.JobClient: Task Id : task_0001_m_000001_1, Status : FAILED 07/08/01 11:09:16 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:16 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:17 INFO mapred.JobClient: Task Id : task_0001_m_000004_1, Status : FAILED 07/08/01 11:09:18 INFO mapred.JobClient: Task Id : task_0001_m_000005_1, Status : FAILED 07/08/01 11:09:21 INFO mapred.JobClient: Task Id : task_0001_m_000000_2, Status : FAILED 07/08/01 11:09:21 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:21 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:21 INFO mapred.JobClient: Task Id : task_0001_m_000003_1, Status : FAILED 07/08/01 11:09:21 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:21 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com
07/08/01 11:09:27 INFO mapred.JobClient:  map 100% reduce 100%
07/08/01 11:09:27 INFO mapred.JobClient: Task Id : task_0001_m_000001_2, Status : FAILED 07/08/01 11:09:27 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com 07/08/01 11:09:27 WARN mapred.JobClient: Error reading task outputubuntu704.e-manation.com
Exception in thread "main" java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
   at com.lingway.hadoopScratchPad.WordCount.main(WordCount.java:145)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1





Reply via email to