[ 
https://issues.apache.org/jira/browse/DRILL-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6468:
---------------------------------
    Labels: ready-to-commit  (was: )

> OOMs trigger graceful shutdown when terminating Drill. This can cause a hang.
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-6468
>                 URL: https://issues.apache.org/jira/browse/DRILL-6468
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>
> Drill may never terminate in the event of a Heap OOM. When this happens we 
> see stack traces like the following:
> {code}
> "250387a7-363d-619c-d745-57ae50f19d15:frag:0:0" #104 daemon prio=10 os_prio=0 
> tid=0x00007fd9d1eec190 nid=0xd7d5 in Object.wait() [0x00007fd953de2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Thread.join(Thread.java:1252)
>         - locked <0x00000005c06bee28> (a 
> org.apache.drill.exec.server.Drillbit$ShutdownThread)
>         at java.lang.Thread.join(Thread.java:1326)
>         at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>         at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>         at java.lang.Shutdown.runHooks(Shutdown.java:123)
>         at java.lang.Shutdown.sequence(Shutdown.java:167)
>         at java.lang.Shutdown.exit(Shutdown.java:212)
>         - locked <0x00000005c1d8bb28> (a java.lang.Class for 
> java.lang.Shutdown)
>         at java.lang.Runtime.exit(Runtime.java:109)
>         at java.lang.System.exit(System.java:971)
>         at 
> org.apache.drill.common.CatastrophicFailure.exit(CatastrophicFailure.java:49)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:246)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Here CatastrophicFailure.exit is being called when we encounter a Heap OOM. 
> Then we call System.exit to terminate the java process. The only issue is 
> that System.exit run's Drill's normal shutdown hook and tries to do a 
> graceful shutdown. In the case of a Heap OOM we cannot do this reliable 
> because there physically isn't enough memory to proceed executing our code. 
> The JVM likely gets stuck a various places waiting on garbage collection and 
> object allocations on the heap and the Drillbit stops making progress.
> *Solution To Hanging Shutdown*
> There are two kinds of OutOfMemory exceptions in Drill. Direct Memory OOMs 
> and Heap OOMs. Typically Direct Memory OOMs are recoverable because Drill 
> uses Direct Memory to store data only, so we can fail a query and lose data 
> and recover. Heap OOMs are unrecoverable because we actually need the Heap to 
> execute our code, and if we can't use the heap then we basically can't run 
> our code reliably.
> When Drill experiences a catastrophic failure we should not call System.exit 
> because then we will try to shutdown gracefully. In the event of a 
> catastrophic failure like a Heap OOM we cannot recover so we should 
> forcefully terminate the jvm with Runtime.getRuntime().halt .
> This will make Drill shutdown promptly in the event of a Heap OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to