Dénes Bodó created OOZIE-3722: --------------------------------- Summary: Workflow actions can stuck in RUNNING state when DB connections are killed on the DB side Key: OOZIE-3722 URL: https://issues.apache.org/jira/browse/OOZIE-3722 Project: Oozie Issue Type: Bug Components: core Affects Versions: 5.2.1 Reporter: Dénes Bodó Assignee: Dénes Bodó
Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 1.5.4. These are ancient versions, I know. h1. Description The issue is that when due to some network issues or "maintenance work" on the DB side (especially PostgreSQL) which causes the DB connection to be closed, it results exhausted Pool on the client side. Many threads are waiting at this point: {noformat} "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 waiting on condition [0x000000030f3e7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000066aca8e70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324) {noformat} According to my observation this is because the JDBC driver does not get closed on the client side, nor the abstract DBCP connection _org.apache.commons.dbcp2.PoolableConnection_ . This issue can cause workflow actions stuck in RUNNING state because the thread which would update the DB after XActionExecutor.check() doesn't get a connection causing the thread stuck infinitely. h1. Workaround Restarts Oozie and/or fix the DB/network issue. h1. Repro (Un)Fortunately I can reproduce the issue using the latest and greatest commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2. I've just created a Java application to reproduce the issue: [https://github.com/dionusos/pool_exhausted_repro] . See README.md for detailed repro steps. DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on the case to provide them the necessary information. -- This message was sent by Atlassian Jira (v8.20.10#820010)