[ 
https://issues.apache.org/jira/browse/TEZ-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2544:
----------------------------
    Description: 
Expected TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
physicalEdgeCount=2, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}
{noformat}

The actual TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
physicalEdgeCount=1, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}
{noformat}

The expected physicalEdgeCount is 2 but actually it is 1, it happens when 
dynamic parallelism estimation is enabled. 

The cause is that Task is recovering but its vertex's source edge manager has 
not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager, so 
will result in different physicalEdgeCount for InputSpec


  was:
Expected TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
physicalEdgeCount=2, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}
{noformat}

The actual TaskSpec
{noformat}
DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
inputSpecListSize=1, 
outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
physicalEdgeCount=1, 
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
 }}
{noformat}

The expected physicalEdgeCount is 2 but actually it is 1, it happens when 
dynamic parallelism estimation is enabled. 



> Incorrect dag result due to wrong TaskSpec in recovering
> --------------------------------------------------------
>
>                 Key: TEZ-2544
>                 URL: https://issues.apache.org/jira/browse/TEZ-2544
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>              Labels: Recovery
>
> Expected TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
> inputSpecListSize=1, 
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
> physicalEdgeCount=2, 
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
>  }}
> {noformat}
> The actual TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, 
> TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, 
> processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, 
> inputSpecListSize=1, 
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, 
> physicalEdgeCount=1, 
> inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, 
> ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, 
> outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
>  }}
> {noformat}
> The expected physicalEdgeCount is 2 but actually it is 1, it happens when 
> dynamic parallelism estimation is enabled. 
> The cause is that Task is recovering but its vertex's source edge manager has 
> not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager, 
> so will result in different physicalEdgeCount for InputSpec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to