[
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736014#action_12736014
]
Vinod K V commented on MAPREDUCE-801:
-------------------------------------
Hong, I am trying to understand the effect of this issue. Sorry, I couldn't
quite follow PIG-878.
You mentioned two:
(1) artificial inflation of locality
(2) more memory usage by scheduler(actually JobTracker), and more work in
heartbeats.
(1) affects the user's job itself, for e.g., the job might take longer to
complete. So, I think the onus is on the user to provide a correct a proper
input-split.
(2) is a burden on the framework. May be the framework can handle this by
`memorizing' only a fraction of the locations returned, may be a percentage of
cluster capacity (instead of disregarding all of them as you've proposed
earlier).
May be, if you throw some light on what impact this might have, we can arrive
at an appropriate solution.
> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through
> input-split, an example of which is PIG-878. When an input split returns too
> many locations, it would not only artificially inflate the percentage of data
> local or rack local maps, but also force scheduler to use more memory and
> work harder to conduct task assignment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.