Pass the size of the MapReduce input to JobInProgress
-----------------------------------------------------
Key: HADOOP-3441
URL: https://issues.apache.org/jira/browse/HADOOP-3441
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Affects Versions: 0.17.0
Environment: all
Reporter: Ari Rabkin
Assignee: Ari Rabkin
Priority: Minor
Fix For: 0.18.0
Attachments: addDataSize.patch
Currently, there's no easy way for the JobInProgress to know how large the
job's input data is.
This patch corrects the problem, by storing the size of the input split's data
through the RawSplit. The sizes of each split are then totaled up and made
available via JobInProgress.getInputSize().
This is needed, among other reasons, so that the JobInProgress knows how much
data it's being run on, which will help build smarter schedulers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.