[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

Rajesh Balamohan (JIRA) Wed, 16 Sep 2015 02:36:54 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-2834:
----------------------------------
    Attachment: application_1442254312093_0095.1.log.gz
                application_1442254312093_0095.2.log.gz
                DAG_view.png
                hive_view.png

Attaching DAG, hive_view and app logs for reference. App logs has been split 
into 2 and uploaded as they are huge.

{noformat}
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Creating 2 tasks for vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: 
Directly initializing vertex: vertex_1442254312093_0095_1_05 [Reducer 9]
...
2015-09-15 09:43:25,493 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: 
attempt_1442254312093_0095_1_05_000000_0 TaskAttempt Transitioned from NEW to 
START_WAIT due to event TA_SCHEDULE
2015-09-15 09:43:25,493 INFO [TaskSchedulerEventHandlerThread] 
rm.YarnTaskSchedulerService: Allocation request for task: 
attempt_1442254312093_0095_1_05_000000_0 with request: Capability[<memory:8192, 
vCores:1>]Priority[11] host: null rack: null
{noformat}

Reducer 9 is not getting transitioned after "NEW to START_WAIT due to event 
TA_SCHEDULE"

> tez app hangs at large scale (~30TB)
> ------------------------------------
>
>                 Key: TEZ-2834
>                 URL: https://issues.apache.org/jira/browse/TEZ-2834
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Rajesh Balamohan
>         Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, 
> application_1442254312093_0095.2.log.gz, hive_view.png
>
>
> Will attach the DAG.
> Repro for reference: TPC-DS q_70 @ 30 TB scale.
> "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched 
> slightly late.  But before "Reducer 9" can get scheduled, slots are taken up 
> by "Map 1", which is not preempted for running "Reducer 9".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)

Reply via email to