[
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736427#action_12736427
]
Hong Tang commented on MAPREDUCE-801:
-------------------------------------
bq. In PIG-878 Arun made a sensible suggestion, that the number of locations of
a split should not be greater than the replication level of the file. This
could be checked by FileInputFormat.
We probably want to have infrastructure to check this, and the precise reason
why PIG-878 happens is because PigInputFormat is not derived from
FileInputFormat (which would have implemented the split location correctly).
bq. Another approach might be to add counters for rack-local and local task
placements and i/o. If the tasks are placed locally but the i/o is not done
locally, that's a bad sign.
+1.
> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through
> input-split, an example of which is PIG-878. When an input split returns too
> many locations, it would not only artificially inflate the percentage of data
> local or rack local maps, but also force scheduler to use more memory and
> work harder to conduct task assignment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.