[
https://issues.apache.org/jira/browse/TEZ-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2251:
----------------------------------
Attachment: tez-2251.vertexpatch.am.log.gz
{noformat}
2015-04-01 16:18:39,157 INFO [App Shared Pool - #0]
vertexmanager.ShuffleVertexManager: Scheduling 46 tasks for vertex: Reducer 6
with totalTasks: 46. 52 source tasks completed out of 64.
SourceTaskCompletedFraction: 0.8125 min: 0.25 max: 0.75
2015-04-01 16:18:39,157 INFO [App Shared Pool - #0]
vertexmanager.ShuffleVertexManager: Reduce auto parallelism for vertex: Reducer
6 to 6 from 46 . Expected output: 28669934 based on actual output: 23294322
from 52 vertex manager events. desiredTaskInputSize: 67108864 max slow start
tasks:48.0 num sources completed:52
...
...
2015-04-01 16:18:39,158 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl:
remoteTaskSpec:DAGName :
rajesh_20150401161643_06df4714-fa52-4797-9d49-a3aaf969fe3e:1, VertexName:
Reducer 5, VertexParallelism: 1,
TaskAttemptID:attempt_1424502260528_2009_1_09_000000_0,
processorName=org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,
inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{
sourceVertexName=Reducer 4, physicalEdgeCount=4044,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }},
], outputSpecList=[{{ destinationVertexName=Reducer 6, physicalEdgeCount=6,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}, ]
2015-04-01 16:18:39,158 INFO [App Shared Pool - #0] impl.VertexImpl: Routing
pending task events for vertex: vertex_1424502260528_2009_1_10 [Reducer 6]
{noformat}
destinationVertexName=Reducer 6, physicalEdgeCount=6 should have been
destinationVertexName=Reducer 6, physicalEdgeCount=46.
> Enabling auto reduce parallelism in certain jobs causes DAG to hang
> -------------------------------------------------------------------
>
> Key: TEZ-2251
> URL: https://issues.apache.org/jira/browse/TEZ-2251
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Attachments: TEZ-2251.VertexImpl.patch,
> TEZ-2251.fix_but_slows_down.patch, hive_console.png,
> tez-2251.vertexpatch.am.log.gz, tez_2251_dag.png
>
>
> Scenario:
> - Run TPCH query20
> (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpch/tpch_query20.sql)
> at 1 TB scale (tez-master branch, hive trunk)
> - Enable auto reduce parallelism
> - DAG didn't complete and got stuck in "Reducer 6"
> Vertex parallelism of "Reducer 5 & 6" happens within a span of 3
> milliseconds, and tasks of "reducer 5" ends up producing wrong partition
> details as it sees the updated task numbers of reducer 6 when scheduled.
> This causes, job to hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)