[ https://issues.apache.org/jira/browse/OOZIE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817237#comment-17817237 ]
Dénes Bodó commented on OOZIE-3722: ----------------------------------- If anybody, who found this ticket, has any suggestion, solution or question, please do not hesitate to ask here or on any Oozie mailing lists. > Workflow actions can stuck in RUNNING state when DB connections are killed on > the DB side > ----------------------------------------------------------------------------------------- > > Key: OOZIE-3722 > URL: https://issues.apache.org/jira/browse/OOZIE-3722 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 5.2.1 > Reporter: Dénes Bodó > Assignee: Dénes Bodó > Priority: Critical > > Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool > 1.5.4. These are ancient versions, I know. > h1. Description > The issue is that when due to some network issues or "maintenance work" on > the DB side (especially PostgreSQL) which causes the DB connection to be > closed, it results exhausted Pool on the client side. Many threads are > waiting at this point: > {noformat} > "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 > waiting on condition [0x000000030f3e7000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000000066aca8e70> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324) > {noformat} > According to my observation this is because the JDBC driver does not get > closed on the client side, nor the abstract DBCP connection > _org.apache.commons.dbcp2.PoolableConnection_ . > > This issue can cause workflow actions stuck in RUNNING state because the > thread which would update the DB after XActionExecutor.check() doesn't get a > connection causing the thread stuck infinitely. > > h1. Workaround > Restarts Oozie and/or fix the DB/network issue. > h1. Repro > (Un)Fortunately I can reproduce the issue using the latest and greatest > commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2. > I've just created a Java application to reproduce the issue: > [https://github.com/dionusos/pool_exhausted_repro] . See README.md for > detailed repro steps. > > DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on > the case to provide them the necessary information. -- This message was sent by Atlassian Jira (v8.20.10#820010)