[
https://issues.apache.org/jira/browse/TEZ-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167735#comment-14167735
]
Bikas Saha commented on TEZ-1649:
---------------------------------
{code}+ for(Map.Entry<String, Set<Integer>> entry :
bipartiteSources.entrySet()) {
+ String sourceVertex = entry.getKey();
+ Set<Integer> completedTasks = entry.getValue();
+ int numSourceTasks = getContext().getVertexNumTasks(sourceVertex);
+ if ((numSourceTasks > 0 && slowStartMaxSrcCompletionFraction > 0) &&
completedTasks.isEmpty()) {
+ LOG.info("Defer scheduling tasks for vertex:" +
getContext().getVertexName()
+ + " as completed tasks is empty for " + sourceVertex);
+ return false;
+ }
+ }{code}
bq. If min/max is set to 0, we need to start all the tasks immediately. Didn't
want to violate that existing contract and hen
OK. However I think the logic to wait for at least 1 complete tasks from all
edges will override maxSrcFraction because its the first gating factor for
safety. After this gating factor is crossed then the main slow start logic
needs to be crossed that waits for minSrcFraction or minDataSize. The slow
start logic will continue to stay as is. The first safety gating factor will
improve over time. Currently we are waiting for 1 completion for edge. Later we
will change to all tasks scheduled per edge and some other faster heuristic.
Right?
The log should probably be debug level.
bq. s a corner scenario wherein custom/broadcast is
I see what you are saying. But we are waiting for only 1 completion from the
broadcast and not full completion. My gut feeling is that all tasks of the
broadcast being slow would be uncommon.
bq. accuracy (e.g 50% completed in edge1, 15% completed in edge2, 10% completed
in edge3)
I meant that the minSrcFraction and minDataSize logic would be tracked per SG
edge.
> ShuffleVertexManager auto reduce parallelism can cause jobs to hang
> indefinitely (with ScatterGather edges)
> -----------------------------------------------------------------------------------------------------------
>
> Key: TEZ-1649
> URL: https://issues.apache.org/jira/browse/TEZ-1649
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-1649.1.patch, TEZ-1649.2.patch, TEZ-1649.png
>
>
> Consider the following DAG
> M1, M2 --> R1
> M2, M3 --> R2
> R1 --> R2
> All edges are Scatter-Gather.
> 1. Set R1's (1000 parallelism) min/max setting to 0.25 - 0.5f
> 2. Set R2's (21 parallelism) min/max setting to 0.2 and 0.3f
> 3. Let M1 send some data from HDFS (test.txt)
> 4. Let M2 (50 parallelism) generate some data and send it to R2
> 5. Let M3 (500 parallelism) generate some data and send it to R2
> - Since R2's min/max can get satisfied by getting events from M3 itself, R2
> will change its parallelism quickly than R1.
> - In the mean time, R1 changes its parallelism from 1000 to 20. This is not
> propagated to R2 and it would keep waiting.
> Tested this on a small scale (20 node) cluster and it happens consistently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)