[ 
https://issues.apache.org/jira/browse/OOZIE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817237#comment-17817237
 ] 

Dénes Bodó commented on OOZIE-3722:
-----------------------------------

If anybody, who found this ticket, has any suggestion, solution or question, 
please do not hesitate to ask here or on any Oozie mailing lists.

> Workflow actions can stuck in RUNNING state when DB connections are killed on 
> the DB side
> -----------------------------------------------------------------------------------------
>
>                 Key: OOZIE-3722
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3722
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.2.1
>            Reporter: Dénes Bodó
>            Assignee: Dénes Bodó
>            Priority: Critical
>
> Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
> 1.5.4. These are ancient versions, I know.
> h1. Description
> The issue is that when due to some network issues or "maintenance work" on 
> the DB side (especially PostgreSQL) which causes the DB connection to be 
> closed, it results exhausted Pool on the client side. Many threads are 
> waiting at this point:
> {noformat}
> "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 
> waiting on condition [0x000000030f3e7000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000066aca8e70> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>       at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
>  {noformat}
> According to my observation this is because the JDBC driver does not get 
> closed on the client side, nor the abstract DBCP connection 
> _org.apache.commons.dbcp2.PoolableConnection_ .
>  
> This issue can cause workflow actions stuck in RUNNING state because the 
> thread which would update the DB after XActionExecutor.check() doesn't get a 
> connection causing the thread stuck infinitely.
>  
> h1. Workaround
> Restarts Oozie and/or fix the DB/network issue.
> h1. Repro
> (Un)Fortunately I can reproduce the issue using the latest and greatest 
> commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.
> I've just created a Java application to reproduce the issue: 
> [https://github.com/dionusos/pool_exhausted_repro] . See README.md for 
> detailed repro steps.
>  
> DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on 
> the case to provide them the necessary information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to