[
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2834:
----------------------------------
Attachment: application_1442254312093_0095.1.log.gz
application_1442254312093_0095.2.log.gz
DAG_view.png
hive_view.png
Attaching DAG, hive_view and app logs for reference. App logs has been split
into 2 and uploaded as they are huge.
{noformat}
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl:
Creating 2 tasks for vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl:
Directly initializing vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
...
2015-09-15 09:43:25,493 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl:
attempt_1442254312093_0095_1_05_000000_0 TaskAttempt Transitioned from NEW to
START_WAIT due to event TA_SCHEDULE
2015-09-15 09:43:25,493 INFO [TaskSchedulerEventHandlerThread]
rm.YarnTaskSchedulerService: Allocation request for task:
attempt_1442254312093_0095_1_05_000000_0 with request: Capability[<memory:8192,
vCores:1>]Priority[11] host: null rack: null
{noformat}
Reducer 9 is not getting transitioned after "NEW to START_WAIT due to event
TA_SCHEDULE"
> tez app hangs at large scale (~30TB)
> ------------------------------------
>
> Key: TEZ-2834
> URL: https://issues.apache.org/jira/browse/TEZ-2834
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.1
> Reporter: Rajesh Balamohan
> Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz,
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched
> slightly late. But before "Reducer 9" can get scheduled, slots are taken up
> by "Map 1", which is not preempted for running "Reducer 9".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)