[
https://issues.apache.org/jira/browse/TEZ-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579772#comment-14579772
]
Jeff Zhang commented on TEZ-2544:
---------------------------------
Actually recovery don't work when auto parallelism estimation is enabled,
besides this issue, sometimes other error will happen as described in TEZ-2107.
Link with TEZ-2107, try to resolve these issues together.
> Incorrect dag result due to wrong TaskSpec in recovering
> --------------------------------------------------------
>
> Key: TEZ-2544
> URL: https://issues.apache.org/jira/browse/TEZ-2544
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Priority: Blocker
> Labels: Recovery
>
> Expected TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
> inputSpecListSize=1,
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
> physicalEdgeCount=2,
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
> }}
> {noformat}
> The actual TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
> inputSpecListSize=1,
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
> physicalEdgeCount=1,
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
> }}
> {noformat}
> The expected physicalEdgeCount is 2 but actually it is 1, it happens when
> dynamic parallelism estimation is enabled.
> The cause is that Task is recovering but its vertex's source edge manager has
> not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager,
> so will result in different physicalEdgeCount for InputSpec
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)