[ 
https://issues.apache.org/jira/browse/TEZ-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166525#comment-14166525
 ] 

Bikas Saha edited comment on TEZ-1649 at 10/10/14 7:37 AM:
-----------------------------------------------------------

Not quite sure what this if-stmt is doing with maxSrcFraction
{code}+      if ((numSourceTasks > 0 && slowStartMaxSrcCompletionFraction > 0) 
&& completedTasks.isEmpty()) {
+        LOG.info("Defer scheduling tasks for vertex:" + 
getContext().getVertexName()
+            + " as completed tasks is empty for " + sourceVertex);
+        return false;
+      }
+{code}

This is probably not covering the case where there is another edge of type 
custom/broadcast and the slow start calculation ends up starting this vertex 
tasks before the custom/broadcast source vertex tasks have started. We will 
probably need an explicit test case for the case mentioned in the description 
and the mix of SG and broadcast edges. This could be a unit test in 
TestShuffleVertexManager that just checks for scheduling being triggered only 
after all source vertex checks have passed.

Unrelated to the out of order starting case, we should probably track bipartite 
edge task completions and output sizes per edge. When enough stats have been 
obtained per edge then we can extrapolate the output size per edge and then sum 
over all edges to get the total extrapolated output size. This would improve 
the accuracy of the extrapolation. Thoughts? 


was (Author: bikassaha):
Not quite sure what this if-stmt is doing with maxSrcFraction
{code}+      if ((numSourceTasks > 0 && slowStartMaxSrcCompletionFraction > 0) 
&& completedTasks.isEmpty()) {
+        LOG.info("Defer scheduling tasks for vertex:" + 
getContext().getVertexName()
+            + " as completed tasks is empty for " + sourceVertex);
+        return false;
+      }
+{code}

This is probably not covering the case where there is another edge of type 
custom/broadcast and the slow start calculation ends up starting this vertex 
tasks before the custom/broadcast source vertex tasks have started. We will 
probably need an explicit test case for the case mentioned in the description 
and the mix of SG and broadcast edges.

Unrelated to the out of order starting case, we should probably track bipartite 
edge task completions and output sizes per edge. When enough stats have been 
obtained per edge then we can extrapolate the output size per edge and then sum 
over all edges to get the total extrapolated output size. This would improve 
the accuracy of the extrapolation. Thoughts? 

> ShuffleVertexManager auto reduce parallelism can cause jobs to hang 
> indefinitely (with ScatterGather edges)
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1649
>                 URL: https://issues.apache.org/jira/browse/TEZ-1649
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1649.1.patch, TEZ-1649.2.patch, TEZ-1649.png
>
>
> Consider the following DAG
>  M1, M2 --> R1
>  M2, M3 --> R2
>  R1 --> R2
> All edges are Scatter-Gather.
>  1. Set R1's (1000 parallelism) min/max setting to 0.25 - 0.5f
>  2. Set R2's (21 parallelism) min/max setting to 0.2 and 0.3f
>  3. Let M1 send some data from HDFS (test.txt)
>  4. Let M2 (50 parallelism) generate some data and send it to R2
>  5. Let M3 (500 parallelism) generate some data and send it to R2
> - Since R2's min/max can get satisfied by getting events from M3 itself, R2 
> will change its parallelism quickly than R1.
> - In the mean time, R1 changes its parallelism from 1000 to 20.  This is not 
> propagated to R2 and it would keep waiting.
> Tested this on a small scale (20 node) cluster and it happens consistently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to