[
https://issues.apache.org/jira/browse/TEZ-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zhang updated TEZ-2544:
----------------------------
Description:
Expected TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
inputSpecListSize=1,
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
physicalEdgeCount=2,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}
{noformat}
The actual TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
inputSpecListSize=1,
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
physicalEdgeCount=1,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}
{noformat}
The expected physicalEdgeCount is 2 but actually it is 1, it happens when
dynamic parallelism estimation is enabled.
The cause is that Task is recovering but its vertex's source edge manager has
not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager, so
will result in different physicalEdgeCount for InputSpec
was:
Expected TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
inputSpecListSize=1,
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
physicalEdgeCount=2,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}
{noformat}
The actual TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
inputSpecListSize=1,
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
physicalEdgeCount=1,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}
{noformat}
The expected physicalEdgeCount is 2 but actually it is 1, it happens when
dynamic parallelism estimation is enabled.
> Incorrect dag result due to wrong TaskSpec in recovering
> --------------------------------------------------------
>
> Key: TEZ-2544
> URL: https://issues.apache.org/jira/browse/TEZ-2544
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Labels: Recovery
>
> Expected TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
> inputSpecListSize=1,
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
> physicalEdgeCount=2,
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
> }}
> {noformat}
> The actual TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1,
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0,
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor,
> inputSpecListSize=1,
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer,
> physicalEdgeCount=1,
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1,
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
> }}
> {noformat}
> The expected physicalEdgeCount is 2 but actually it is 1, it happens when
> dynamic parallelism estimation is enabled.
> The cause is that Task is recovering but its vertex's source edge manager has
> not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager,
> so will result in different physicalEdgeCount for InputSpec
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)