[ 
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736464#action_12736464
 ] 

eric baldeschwieler commented on MAPREDUCE-801:
-----------------------------------------------

Hi Doug,

I think we are making the perfect the enemy of the good here.  A real bug 
existed that cost us performance.  Having 20 options on placement is not going 
to improve scheduling noticeably.  Having hundreds can bring down the 
centralize resources of the system and even 20 would cause lots of completely 
unneeded work in the JT for little gain. 

I'd like to see us discard anything beyond the first 5 options in the JT just 
to keep bugs from DOSing the central server.  I am not aware of any use case 
where this would hinder performance.  Having a warning and truncating this list 
would have saved use a lot of resource and time.  

The system is full of numbers.  Sometime it is simpler to harden the system 
then ID general principles.  There are many places in the system where I think 
this would be the wrong approach, but huge huge split lists are much more 
likely to be the result of bugs or ignorance than need.

If we inject a warning and anyone hits the case, we can then do more work to 
enhance this. 

E14


> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through 
> input-split, an example of which is PIG-878. When an input split returns too 
> many locations, it would not only artificially inflate the percentage of data 
> local or rack local maps, but also force scheduler to use more memory and 
> work harder to conduct task assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to