[ 
https://issues.apache.org/jira/browse/DBCP-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816520#comment-17816520
 ] 

Dénes Bodó commented on DBCP-595:
---------------------------------

[~psteitz] 

In the "Oozie-like repro steps" the following is configured:
{code:java}
        connectionProperties.append("TestOnBorrow=").append("true").append(",");
        connectionProperties.append("TestOnReturn=").append("true").append(",");
        
connectionProperties.append("TestWhileIdle=").append("true").append(",");{code}
This means the program will stuck and threads will wait infinitely at 
org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324).
 The only difference between connection validation and no connection validation 
is that how much time do we have to wait until the program sticks is.

 

My observation is that there is some kind of timing issue in the background as 
we experience exhausted pool no matter what and connection validation only 
delays the issue and not solves it.

> Connection pool can be exhausted when connections are killed on the DB side
> ---------------------------------------------------------------------------
>
>                 Key: DBCP-595
>                 URL: https://issues.apache.org/jira/browse/DBCP-595
>             Project: Commons DBCP
>          Issue Type: Bug
>    Affects Versions: 2.11.0
>            Reporter: Dénes Bodó
>            Priority: Critical
>              Labels: deadlock, robustness
>
> Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
> 1.5.4. These are ancient versions, I know.
> h1. Description
> The issue is that when due to some network issues or "maintenance work" on 
> the DB side (especially PostgreSQL) which causes the DB connection to be 
> closed, it results exhausted Pool on the client side. Many threads are 
> waiting at this point:
> {noformat}
> "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 
> waiting on condition [0x000000030f3e7000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000066aca8e70> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>       at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
>  {noformat}
> According to my observation this is because the JDBC driver does not get 
> closed on the client side, nor the abstract DBCP connection 
> _org.apache.commons.dbcp2.PoolableConnection_ .
> h1. Repro
> (Un)Fortunately I can reproduce the issue using the latest and greatest 
> commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.
> I've just created a Java application to reproduce the issue: 
> [https://github.com/dionusos/pool_exhausted_repro] . See README.md for 
> detailed repro steps.
> h1. Kind of solution?
> To be honest I am not really familiar with DBCP but with this change I 
> managed to make my application more robust:
> {code:java}
> diff --git a/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java 
> b/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> index 440cb756..678550bf 100644
> --- a/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> +++ b/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> @@ -214,6 +214,10 @@ public class PoolableConnection extends 
> DelegatingConnection<Connection> impleme
>      @Override
>      protected void handleException(final SQLException e) throws SQLException 
> {
>          fatalSqlExceptionThrown |= isFatalException(e);
> +        if (fatalSqlExceptionThrown && getDelegate() != null) {
> +            getDelegate().close();
> +            this.close();
> +        }
>          super.handleException(e);
>      }{code}
> What do you think about this approach?
> Is it a completely dead-end or we can start working on it in this direction?
> Do you agree that the reported and reproduced issue is a real one and nut 
> just some kind of misconfiguration?
>  
> I am lost at this point and I need to move forward so I am asking for 
> guidance here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to