[ 
https://issues.apache.org/jira/browse/TEZ-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166843#comment-14166843
 ] 

Rajesh Balamohan commented on TEZ-1649:
---------------------------------------

1. If min/max is set to 0, we need to start all the tasks immediately. Didn't 
want to violate that existing contract and hence added that 
"slowStartMaxSrcCompletionFraction > 0" check.

2.
>>>
This is probably not covering the case where there is another edge of type 
custom/broadcast and the slow start calculation ends up starting this vertex 
tasks before the custom/broadcast source vertex tasks have started. 
>>>
Right. There is a corner scenario wherein custom/broadcast is really slow for 
some reason and in the mean time, we get all the onSourceCompleted() events 
related to SG. (i.e (numSourceTasksCompleted == totalNumSourceTasks), but yet 
to receive atleast 1  event from custom/broadcast).  In this case, may be we 
should go ahead with scheduling rather than waiting longer for a task to be 
completed in custom/broadcast edges.  Would this be a valid assumption?  If so, 
I would modify the patch accordingly and post it.

3.  
>>
 we should probably track bipartite edge task completions and output sizes per 
edge. When enough stats have been obtained per edge then we can extrapolate the 
output size per edge and then sum over all edges to get the total extrapolated 
output size. This would improve the accuracy of the extrapolation. 
>>
Yes, in most of the scenarios this would improve accuracy.  However, there are 
cases where one edge has proceeded a lot faster than other leading to less 
accuracy (e.g 50% completed in edge1, 15% completed in edge2, 10% completed in 
edge3). In this case, it is quite possible that the 15% and 10% completed 
doesn't have much data (or zero records) and this might lead to less accurate 
results.  But I definitely agree that it would be good to gather these 
statistics and we can iterate over.

> ShuffleVertexManager auto reduce parallelism can cause jobs to hang 
> indefinitely (with ScatterGather edges)
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1649
>                 URL: https://issues.apache.org/jira/browse/TEZ-1649
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1649.1.patch, TEZ-1649.2.patch, TEZ-1649.png
>
>
> Consider the following DAG
>  M1, M2 --> R1
>  M2, M3 --> R2
>  R1 --> R2
> All edges are Scatter-Gather.
>  1. Set R1's (1000 parallelism) min/max setting to 0.25 - 0.5f
>  2. Set R2's (21 parallelism) min/max setting to 0.2 and 0.3f
>  3. Let M1 send some data from HDFS (test.txt)
>  4. Let M2 (50 parallelism) generate some data and send it to R2
>  5. Let M3 (500 parallelism) generate some data and send it to R2
> - Since R2's min/max can get satisfied by getting events from M3 itself, R2 
> will change its parallelism quickly than R1.
> - In the mean time, R1 changes its parallelism from 1000 to 20.  This is not 
> propagated to R2 and it would keep waiting.
> Tested this on a small scale (20 node) cluster and it happens consistently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to