[jira] [Updated] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

Eric Badger (JIRA) Wed, 08 Feb 2017 09:16:12 -0800

     [ 
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Badger updated TEZ-3274:
-----------------------------
    Attachment: TEZ-3274.001.patch

Uploading a first-effort patch to get some feedback. In this patch I have 
refactored the slow start code in the {{ShuffleVertexManager}} and 
{{FairShuffleVertexManager}} to make it available to all vertex managers via a 
config value ({{TEZ_SHUFFLE_VERTEX_MANAGER_ENABLE_SLOW_START}}) and a wrapper 
around the original {{scheduleTasks()}} function, which I named 
{{scheduleTasksFilter()}}. This preserves the functionality of all the current 
vertex managers, while allowing for them or any new vertex managers to adopt 
the slow start functionality easily in the future. The reason that the current 
vertex managers are not currently compatible out of the box with the refactored 
slow start code is because {{scheduleTasks()}} always scheduled all tasks and 
so the vertex managers could assume that a call to {{scheduleTasks()}} would 
effectively drain the list of pending tasks. However, {{scheduleTasksFilter()}} 
is not guaranteed to schedule all of the tasks or even any of the tasks at all. 
It returns a list of tasks that were actually scheduled so that the vertex 
manager can take the appropriate action (remove them from the pending tasks 
list, drop them, etc.). 

In this patch I have combined the configs for {{ShuffleVertexManager}} and 
{{FairShuffleVertexManager}} into a single set of configs. Currently, there are 
separate configs for each and they seem to be redundant. However, combining 
them may lead to backwards compatibility problems. Thoughts?

Questions about the next step:
  - Is it ok to move the configs to YarnConfiguration.java and change their 
names to something more generic. Until now they were specific to 
{{ShuffleVertexManager}} and {{FairShuffleVertexManager}}. Even then I'm not 
sure why they weren't put in YarnConfiguration.java initially, but [~jeagles] 
noted offline that there may have been a reason for this. 
  - I think that this same sort of refactoring effort can be applied to 
auto-parallelism. Are there any potential pitfalls that I should be weary of 
and/or anything special that I need to take into consideration while 
refactoring?
  - Does it make sense to add in the slow start capability to vertex managers 
such as {{ImmediateStartVertexManager}}. Theoretically slow start could be 
applied, but doesn't really make sense in this context. If slow start were 
enabled and the min was set then some tasks would simply never be scheduled 
since {{scheduleTasks()}} is only called by that vertex manager on start and on 
a vertex state update.

[~bikassaha], [~sseth], [~jeagles], [~rohini], could you please review the 
initial patch and questions? Thanks!

> Vertex with MRInput and shuffle input does not respect slow start
> -----------------------------------------------------------------
>
>                 Key: TEZ-3274
>                 URL: https://issues.apache.org/jira/browse/TEZ-3274
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Eric Badger
>         Attachments: TEZ-3274.001.patch
>
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and 
> not ShuffleVertexManager) and start containers and tasks immediately. In this 
> scenario, resources can be wasted since they do not respect 
> tez.shuffle-vertex-manager.min-src-fraction 
> tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

Reply via email to