[
https://issues.apache.org/jira/browse/HIVE-10842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth resolved HIVE-10842.
-----------------------------------
Resolution: Fixed
Fix Version/s: llap
This should fix it. Removed some unnecessary synchronization and fixed some
conditions.
> LLAP: DAGs get stuck in yet another way
> ---------------------------------------
>
> Key: HIVE-10842
> URL: https://issues.apache.org/jira/browse/HIVE-10842
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
> Assignee: Siddharth Seth
> Fix For: llap
>
> Attachments: HIVE-10842.1.txt
>
>
> Looks exactly like HIVE-10744. Last comment there has internal app IDs. Logs
> upon request.
> 6 (number of slots) tasks from a machine are stuck.
> jstack for target daemon sayeth:
> {noformat}
> 7 Found one Java-level deadlock:
> 8 =============================
> 9
> 10 "IPC Server handler 4 on 15001":
> 11 waiting to lock Monitor@0x00007f3cb0005cb8 (Object@0x000000008cc3ce98,
> a java/lang/Object),
> 12 which is held by "Wait-Queue-Scheduler-0"
> 13 "Wait-Queue-Scheduler-0":
> 14 waiting to lock Monitor@0x00007f3cb0004d98 (Object@0x000000009234cf58,
> a org/apache/hadoop/hive/llap/daemon/impl/Q
> ueryInfo$FinishableStateTracker),
> 15 which is held by "IPC Server handler 4 on 15001"
> {noformat}
> Oh, this time it is not q1; I was running bunch of TPCDS queries in sequence
> for some cache test. No parallel queries. There may have been task failures
> before.
> The query that got stuck had lots and lots of reducers
> {noformat}
> Map 1: 1/1 Map 10: 1/1 Map 11: 85/85 Map 13: 1/1 Map 14: 1/1
> Map 15: 1/1 Map 16: 1/1 Map 17: 94/94 Map 19: 1/1 Map 2: 1/1
> Map 20: 1/1 Map 3: 91/91 Map 7: 1/1 Map 8: 1/1 Map 9: 1/1
> Reducer 12: 391/391 Reducer 18: 197/197 Reducer 4: 1009/1009 Reducer
> 5: 1003(+6)/1009 Reducer 6: 0(+1)/1
> {noformat}
> I think it's query 58
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)