[ 
https://issues.apache.org/jira/browse/HADOOP-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated HADOOP-5075:
----------------------------------

    Attachment: hadoop-5075-v2.patch

I think I found the cause for this now. The problem was with the weight 
normalization we did in HADOOP-4789 to have equal shares per pools: The map 
weights were also being used for reduces. The reason this caused a problem is 
because sometimes the map weight could be infinity if the job had no runnable 
maps, and then we divided by this in updateMinSlots when calculating guaranteed 
shares for reduces, leading to a 0 in both the ceil and the floor as I said. 
This patch fixes that bug and also ensures that no infinities get introduced.

> Potential infinite loop in updateMinSlots
> -----------------------------------------
>
>                 Key: HADOOP-5075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5075
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Priority: Blocker
>         Attachments: hadoop-5075-v2.patch, hadoop-5075.patch
>
>
> We ran into a problem at Facebook where the updateMinSlots loop in the 
> scheduler was repeating infinitely. This might happen if, due to rounding, we 
> are unable to assign the last few slots in a pool. This patch adds a break 
> statement to ensure that the loop exists if it hasn't managed to assign any 
> slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to