[
https://issues.apache.org/jira/browse/DRILL-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pritesh Maker updated DRILL-6468:
---------------------------------
Labels: ready-to-commit (was: )
> OOMs trigger graceful shutdown when terminating Drill. This can cause a hang.
> -----------------------------------------------------------------------------
>
> Key: DRILL-6468
> URL: https://issues.apache.org/jira/browse/DRILL-6468
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Timothy Farkas
> Assignee: Timothy Farkas
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Drill may never terminate in the event of a Heap OOM. When this happens we
> see stack traces like the following:
> {code}
> "250387a7-363d-619c-d745-57ae50f19d15:frag:0:0" #104 daemon prio=10 os_prio=0
> tid=0x00007fd9d1eec190 nid=0xd7d5 in Object.wait() [0x00007fd953de2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1252)
> - locked <0x00000005c06bee28> (a
> org.apache.drill.exec.server.Drillbit$ShutdownThread)
> at java.lang.Thread.join(Thread.java:1326)
> at
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
> at
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
> at java.lang.Shutdown.runHooks(Shutdown.java:123)
> at java.lang.Shutdown.sequence(Shutdown.java:167)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - locked <0x00000005c1d8bb28> (a java.lang.Class for
> java.lang.Shutdown)
> at java.lang.Runtime.exit(Runtime.java:109)
> at java.lang.System.exit(System.java:971)
> at
> org.apache.drill.common.CatastrophicFailure.exit(CatastrophicFailure.java:49)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:246)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here CatastrophicFailure.exit is being called when we encounter a Heap OOM.
> Then we call System.exit to terminate the java process. The only issue is
> that System.exit run's Drill's normal shutdown hook and tries to do a
> graceful shutdown. In the case of a Heap OOM we cannot do this reliable
> because there physically isn't enough memory to proceed executing our code.
> The JVM likely gets stuck a various places waiting on garbage collection and
> object allocations on the heap and the Drillbit stops making progress.
> *Solution To Hanging Shutdown*
> There are two kinds of OutOfMemory exceptions in Drill. Direct Memory OOMs
> and Heap OOMs. Typically Direct Memory OOMs are recoverable because Drill
> uses Direct Memory to store data only, so we can fail a query and lose data
> and recover. Heap OOMs are unrecoverable because we actually need the Heap to
> execute our code, and if we can't use the heap then we basically can't run
> our code reliably.
> When Drill experiences a catastrophic failure we should not call System.exit
> because then we will try to shutdown gracefully. In the event of a
> catastrophic failure like a Heap OOM we cannot recover so we should
> forcefully terminate the jvm with Runtime.getRuntime().halt .
> This will make Drill shutdown promptly in the event of a Heap OOM.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)