Noel C. F. Codella, Ph.D. created MAPREDUCE-5313:
----------------------------------------------------
Summary: JobTracker Creates Empty Mapper Task, and a Mapper Task
with 2 FileSplits.
Key: MAPREDUCE-5313
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5313
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 1.2.0
Environment: Linux
Reporter: Noel C. F. Codella, Ph.D.
When reading an input file, the Job Tracker seems to assign the first two
FileSplits to a single Mapper Task, then assigns an EMPTY FileSplit (end of
file) to a Mapper Task, which finishes instantaneously. This can affect job
balance, since one map job is now twice as big as the others.
In "src/mapred/org/apache/hadoop/mapred/LineRecordReader.java", line 110, there
is a comment about skipping the first line of the input file by default, since
"next()" reads two lines anyway. This was not the behavior in 0.20.2, which did
not have this problem.
It seems this was not implemented properly and is leading to the issue
described above.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira