[ 
https://issues.apache.org/jira/browse/TEZ-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2269:
----------------------------------
    Attachment: TEZ-2269.test.patch

stacktrace isn't revealing much on deadlock & haven't been successful in 
getting which thread is holding up the lock.

Tried out the test patch attached here multiple number of times, which safely 
uses "tryLock" with timeout  in DAGImpl.getDAGStatus().  With the patch, the 
hang issue is not reproduced. [~sseth] - Thoughts?

> DAGAppMaster becomes unresponsive
> ---------------------------------
>
>                 Key: TEZ-2269
>                 URL: https://issues.apache.org/jira/browse/TEZ-2269
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Rajesh Balamohan
>         Attachments: TEZ-2269.test.patch, 
> app_master_application_1428021179455_0001_jstack.txt, client_jstack.txt
>
>
> Scenario:
> - Run TPCH query20 @ 1 TB scale
> - Tez master branch, Hive trunk
> - auto-reduce parallelism is not an issue (happens with/without auto-reduce 
> parallelism)
> 1 or 2 times in 10 runs, DAGAppMaster would freeze unexpectedly.  There is no 
> pattern observed on which vertex it happens. But when this happens, only 
> option is to kill the application.   I will attach the jstack soon, but that 
> doesn't seem to reveal much.
> Need to debug more; Creating this JIRA for tracking purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to