[
https://issues.apache.org/jira/browse/TEZ-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2269:
----------------------------------
Attachment: TEZ-2269.test.patch
stacktrace isn't revealing much on deadlock & haven't been successful in
getting which thread is holding up the lock.
Tried out the test patch attached here multiple number of times, which safely
uses "tryLock" with timeout in DAGImpl.getDAGStatus(). With the patch, the
hang issue is not reproduced. [~sseth] - Thoughts?
> DAGAppMaster becomes unresponsive
> ---------------------------------
>
> Key: TEZ-2269
> URL: https://issues.apache.org/jira/browse/TEZ-2269
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Rajesh Balamohan
> Attachments: TEZ-2269.test.patch,
> app_master_application_1428021179455_0001_jstack.txt, client_jstack.txt
>
>
> Scenario:
> - Run TPCH query20 @ 1 TB scale
> - Tez master branch, Hive trunk
> - auto-reduce parallelism is not an issue (happens with/without auto-reduce
> parallelism)
> 1 or 2 times in 10 runs, DAGAppMaster would freeze unexpectedly. There is no
> pattern observed on which vertex it happens. But when this happens, only
> option is to kill the application. I will attach the jstack soon, but that
> doesn't seem to reveal much.
> Need to debug more; Creating this JIRA for tracking purposes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)