[jira] Created: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Peeyush Bishnoi (JIRA) Tue, 07 Oct 2008 02:22:08 -0700

Hadoop Streaming failed with large number of input files
--------------------------------------------------------


                 Key: HADOOP-4362
                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.18.1
            Reporter: Peeyush Bishnoi
            Priority: Critical
             Fix For: 0.18.2


Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the 
mapper is /bin/cat and the number of input files is large.

$  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data 
-mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar49637/] []
/tmp/streamjob49638.jar tmpDir=/tmp
08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should
implement Tool for the same.
08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 
16365
08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 
16365
08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : 
java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException

Streaming Job Failed!


But when the input number of files are less job does not fail . 

$ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output 
op_data1 -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar3725/] []
/tmp/streamjob3726.jar tmpDir=/tmp
08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should
implement Tool for the same.
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
[/var/mapred/local]
08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job 
-Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: 
http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data2




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Reply via email to