mapreduce jobs fail when no split is returned via inputFormat.getSplits
-----------------------------------------------------------------------

                 Key: HADOOP-424
                 URL: http://issues.apache.org/jira/browse/HADOOP-424
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.4.0
            Reporter: Frédéric Bertin


I'm using a MapReduce job to process some data logged and timestamped into 
files.
When the job runs, it does not process the whole data, but filters only the 
data that has been logged since the last job run.

However, when no new data has been logged, the job fails because the getSplits 
method of InputFormat returns no split. Thus the number of map tasks is 0. This 
is not intercepted, and the job fails at reduce step because it seems it does 
not find any data to process:

java.io.FileNotFoundException: 
/local/home/hadoop/var/mapred/local/task_0030_r_000000_3/all.2 at 
org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121) at 
org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:47) at 
org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:221) at 
org.apache.hadoop.fs.FileSystem.open(FileSystem.java:150) at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:259) at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:253) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:241) at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1013)

What should be Hadoop's behaviour in such a case?

IMHO, the job should be considered as successful. Indeed, this is not a job 
failure, but just a lack of input data. WDYT?



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to