In my cluster I have node04 as Namenode and the other nodes are data nodes.
- If successful, it looks like:
d...@node06:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write
-fileSize 10000 -nrFiles 1
TestFDSIO.0.0.4
09/01/25 21:37:49 INFO mapred.FileInputFormat: nrFiles = 1
09/01/25 21:37:49 INFO mapred.FileInputFormat: fileSize (MB) = 10000
09/01/25 21:37:49 INFO mapred.FileInputFormat: bufferSize = 1000000
09/01/25 21:37:50 INFO mapred.FileInputFormat: creating control file: 10000
mega bytes, 1 files
09/01/25 21:37:50 INFO mapred.FileInputFormat: created control files for: 1
files
09/01/25 21:37:50 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/01/25 21:37:50 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/01/25 21:37:50 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/01/25 21:37:51 INFO mapred.JobClient: Running job: job_200901252026_0003
09/01/25 21:37:52 INFO mapred.JobClient: map 0% reduce 0%
09/01/25 21:43:04 INFO mapred.JobClient: map 100% reduce 0%
09/01/25 21:43:16 INFO mapred.JobClient: Job complete: job_200901252026_0003
09/01/25 21:43:16 INFO mapred.JobClient: Counters: 16
09/01/25 21:43:16 INFO mapred.JobClient: File Systems
09/01/25 21:43:16 INFO mapred.JobClient: HDFS bytes read=113
09/01/25 21:43:16 INFO mapred.JobClient: HDFS bytes written=10485760079
09/01/25 21:43:16 INFO mapred.JobClient: Local bytes read=113
09/01/25 21:43:16 INFO mapred.JobClient: Local bytes written=262
09/01/25 21:43:16 INFO mapred.JobClient: Job Counters
09/01/25 21:43:16 INFO mapred.JobClient: Launched reduce tasks=1
09/01/25 21:43:16 INFO mapred.JobClient: Rack-local map tasks=1
09/01/25 21:43:16 INFO mapred.JobClient: Launched map tasks=1
09/01/25 21:43:16 INFO mapred.JobClient: Map-Reduce Framework
09/01/25 21:43:16 INFO mapred.JobClient: Reduce input groups=5
09/01/25 21:43:16 INFO mapred.JobClient: Combine output records=10
09/01/25 21:43:16 INFO mapred.JobClient: Map input records=1
09/01/25 21:43:16 INFO mapred.JobClient: Reduce output records=5
09/01/25 21:43:16 INFO mapred.JobClient: Map output bytes=89
09/01/25 21:43:16 INFO mapred.JobClient: Map input bytes=27
09/01/25 21:43:16 INFO mapred.JobClient: Combine input records=10
09/01/25 21:43:16 INFO mapred.JobClient: Map output records=5
09/01/25 21:43:16 INFO mapred.JobClient: Reduce input records=5
09/01/25 21:43:16 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write
09/01/25 21:43:16 INFO mapred.FileInputFormat: Date & time: Sun
Jan 25 21:43:16 CET 2009
09/01/25 21:43:16 INFO mapred.FileInputFormat: Number of files: 1
09/01/25 21:43:16 INFO mapred.FileInputFormat: Total MBytes processed: 10000
09/01/25 21:43:16 INFO mapred.FileInputFormat: Throughput mb/sec:
32.34801286156991
09/01/25 21:43:16 INFO mapred.FileInputFormat: Average IO rate mb/sec:
32.3480110168457
09/01/25 21:43:16 INFO mapred.FileInputFormat: IO rate std deviation:
0.004232947670390486
09/01/25 21:43:16 INFO mapred.FileInputFormat: Test exec time sec:
326.033
09/01/25 21:43:16 INFO mapred.FileInputFormat:
+ The chunks (with the standard size 64 MB) spread on the nodes are:
node04 0
node05 4
node06 4
node07 8
node08 3
node09 137
-> so it means, we have for 10.000 MB 156 chunks and those chunks are spread
on all data nodes ! The script for this output is :
for i in `seq 4 9`;do
ssh node0$i "echo -n node0$i ' '"
ssh node0$i 'ls -lah `find /tmp/hadoop-dinh -type f` | grep 64M | wc -l'
done
- If unsuccessful :
d...@node06:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write
-fileSize 10000 -nrFiles 1
TestFDSIO.0.0.4
09/01/25 22:28:21 INFO mapred.FileInputFormat: nrFiles = 1
09/01/25 22:28:21 INFO mapred.FileInputFormat: fileSize (MB) = 10000
09/01/25 22:28:21 INFO mapred.FileInputFormat: bufferSize = 1000000
09/01/25 22:28:22 INFO mapred.FileInputFormat: creating control file: 10000
mega bytes, 1 files
09/01/25 22:28:22 INFO mapred.FileInputFormat: created control files for: 1
files
09/01/25 22:28:22 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/01/25 22:28:22 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/01/25 22:28:22 INFO mapred.FileInputFormat: Total input paths to process
: 1
09/01/25 22:28:23 INFO mapred.JobClient: Running job: job_200901252228_0001
09/01/25 22:28:24 INFO mapred.JobClient: map 0% reduce 0%
###################### TaskAttemptID ######################
java.io.IOException: All datanodes 10.0.0.9:50010 are bad. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
attempt_200901252228_0001_m_000000_0: ###################### TaskAttemptID
######################
...
attempt_200901252228_0001_m_000000_0: ###################### TaskAttemptID
######################
attempt_200901252228_0001_m_000000_0: Exception closing file
/benchmarks/TestDFSIO/io_data/test_io_0
attempt_200901252228_0001_m_000000_0: java.io.IOException: All datanodes
10.0.0.9:50010 are bad. Aborting...
attempt_200901252228_0001_m_000000_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
attempt_200901252228_0001_m_000000_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
attempt_200901252228_0001_m_000000_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
09/01/25 22:39:17 INFO mapred.JobClient: Task Id :
attempt_200901252228_0001_m_000000_1, Status : FAILED
###################### TaskAttemptID ######################
java.io.IOException: All datanodes 10.0.0.6:50010 are bad. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
attempt_200901252228_0001_m_000000_1: ###################### TaskAttemptID
######################
...
attempt_200901252228_0001_m_000000_1: ###################### TaskAttemptID
######################
attempt_200901252228_0001_m_000000_1: Exception closing file
/benchmarks/TestDFSIO/io_data/test_io_0
attempt_200901252228_0001_m_000000_1: java.io.IOException: All datanodes
10.0.0.6:50010 are bad. Aborting...
attempt_200901252228_0001_m_000000_1: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
attempt_200901252228_0001_m_000000_1: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
attempt_200901252228_0001_m_000000_1: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
09/01/25 22:46:13 INFO mapred.JobClient: Task Id :
attempt_200901252228_0001_m_000000_2, Status : FAILED
###################### TaskAttemptID ######################
java.io.IOException: All datanodes 10.0.0.8:50010 are bad. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
attempt_200901252228_0001_m_000000_2: ###################### TaskAttemptID
######################
...
attempt_200901252228_0001_m_000000_2: ###################### TaskAttemptID
######################
attempt_200901252228_0001_m_000000_2: Exception closing file
/benchmarks/TestDFSIO/io_data/test_io_0
attempt_200901252228_0001_m_000000_2: java.io.IOException: All datanodes
10.0.0.8:50010 are bad. Aborting...
attempt_200901252228_0001_m_000000_2: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
attempt_200901252228_0001_m_000000_2: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
attempt_200901252228_0001_m_000000_2: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1118)
at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:247)
at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:219)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:450)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
-> First Hadoop tried to write the file on node09, then node06, node08 and
the job was failed !
The output of the chunks on all nodes is:
node04 0
node05 97
node06 0
node07 0
node08 0
node09 0
The point I don't understand is sometimes it works for writing a huge file
and sometimes not.
Thanks for the link, I've read it before and the source code, but still
don't get it.
--
View this message in context:
http://www.nabble.com/Job-failed-when-writing-a-huge-file-tp21647888p21658995.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.