Dénes Bodó created OOZIE-3722:
---------------------------------

             Summary: Workflow actions can stuck in RUNNING state when DB 
connections are killed on the DB side
                 Key: OOZIE-3722
                 URL: https://issues.apache.org/jira/browse/OOZIE-3722
             Project: Oozie
          Issue Type: Bug
          Components: core
    Affects Versions: 5.2.1
            Reporter: Dénes Bodó
            Assignee: Dénes Bodó


Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
1.5.4. These are ancient versions, I know.
h1. Description

The issue is that when due to some network issues or "maintenance work" on the 
DB side (especially PostgreSQL) which causes the DB connection to be closed, it 
results exhausted Pool on the client side. Many threads are waiting at this 
point:
{noformat}
"pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 
waiting on condition [0x000000030f3e7000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000066aca8e70> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
 {noformat}
According to my observation this is because the JDBC driver does not get closed 
on the client side, nor the abstract DBCP connection 
_org.apache.commons.dbcp2.PoolableConnection_ .

 

This issue can cause workflow actions stuck in RUNNING state because the thread 
which would update the DB after XActionExecutor.check() doesn't get a 
connection causing the thread stuck infinitely.

 
h1. Workaround

Restarts Oozie and/or fix the DB/network issue.
h1. Repro

(Un)Fortunately I can reproduce the issue using the latest and greatest 
commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.

I've just created a Java application to reproduce the issue: 
[https://github.com/dionusos/pool_exhausted_repro] . See README.md for detailed 
repro steps.

 

DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on the 
case to provide them the necessary information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to