[ 
https://issues.apache.org/jira/browse/DBCP-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818815#comment-17818815
 ] 

Dénes Bodó commented on DBCP-595:
---------------------------------

[~psteitz]

I executed the ReproOneThread class with these settings:
{code:java}
        connectionProperties.append("MaxActive=").append(1).append(",");
        connectionProperties.append("MaxTotal=").append(2).append(",");
        //connectionProperties.append("MaxIdle=").append(2).append(",");
        
connectionProperties.append("fastFailValidation=").append("false").append(",");
        connectionProperties.append("TestOnBorrow=").append("true").append(",");
        connectionProperties.append("TestOnReturn=").append("true").append(",");
        
connectionProperties.append("TestWhileIdle=").append("true").append(",");
        //connectionProperties.append("ValidationQuery=").append("SELECT 
1").append(",");
        
connectionProperties.append("timeBetweenEvictionRunsMillis=").append(10_000).append(",");
        
connectionProperties.append("numTestsPerEvictionRun=").append(10).append(",");
{code}
The program got stuck when the 
org.apache.commons.pool2.impl.GenericObjectPool#create failed to create a new 
connection because newCreateCount > localMaxTotal was true. See this screen 
shot:
!ReproOneThread-screenshot_when_create-newCreateCount_gt_localMaxTotal.png|width=576,height=386!
I created jstack at the time when I created the screen shot and a jstack after 
I let the debugger continue running but the program really stuck:

[^ReproOneThread-jstack_when_create.txt] and 
[^ReproOneThread-jstack_when_stuck.txt].

Setting fastFailValidation to true had no effect, the program successfully 
stuck.

When I set MinIdle to 1 there was a thread running after the program stuck and 
it tried to create new connections but the variables shoed the same as the 
above screen shot: . [^ReproOneThread-jstack-minIdle.txt] 

*Based on this I confirm the program got stuck when numActive == maxActive.*

 

Regarding validation and connection closure in case of exception:
I played around with my repro code (ReproDBCP):
 * Closed the connection got from DataSource::getConnection() when it was not 
null
 ** when exception occurred
 ** in a finally block
 * maxActive=1, maxTotal=2, validation turned off completely
 * 4 threads

There was no sign of any deadlock during testing.

 

This confirms your theory that when the issue is happening in Oozie the 
"client" does not close the connection after it notices the exception about 
connection closure.

 

My questions:

1. As OpenJPA is the client in Oozie's perspective does it mean that I have to 
check OpenJPA code if it closes/releases the connection when catches an 
exception from _org.apache.commons.dbcp2.PoolableConnection#handleException_ ?
Is it the right approach to drop/close the connection by the client? Shouldn't 
the client get notified about *fatalSqlExceptionThrown* instead of a simple 
SQLException?
{code:java}
@Override
protected void handleException(final SQLException e) throws SQLException {
    fatalSqlExceptionThrown |= isFatalException(e);
    super.handleException(e);
} {code}
 

2. If DBCP is aware that this is a fatalSqlException, shouldn't it handle the 
situation by closing the connection automatically? - I know, this is what I 
suggested in my patch. Just curious.

 

Thank you.

> Connection pool can be exhausted when connections are killed on the DB side
> ---------------------------------------------------------------------------
>
>                 Key: DBCP-595
>                 URL: https://issues.apache.org/jira/browse/DBCP-595
>             Project: Commons DBCP
>          Issue Type: Bug
>    Affects Versions: 2.11.0
>            Reporter: Dénes Bodó
>            Priority: Critical
>              Labels: deadlock, robustness
>         Attachments: ReproOneThread-jstack-minIdle.txt, 
> ReproOneThread-jstack_when_create.txt, ReproOneThread-jstack_when_stuck.txt, 
> ReproOneThread-screenshot_when_create-newCreateCount_gt_localMaxTotal.png
>
>
> Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
> 1.5.4. These are ancient versions, I know.
> h1. Description
> The issue is that when due to some network issues or "maintenance work" on 
> the DB side (especially PostgreSQL) which causes the DB connection to be 
> closed, it results exhausted Pool on the client side. Many threads are 
> waiting at this point:
> {noformat}
> "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 
> waiting on condition [0x000000030f3e7000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000066aca8e70> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>       at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
>  {noformat}
> According to my observation this is because the JDBC driver does not get 
> closed on the client side, nor the abstract DBCP connection 
> _org.apache.commons.dbcp2.PoolableConnection_ .
> h1. Repro
> (Un)Fortunately I can reproduce the issue using the latest and greatest 
> commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.
> I've just created a Java application to reproduce the issue: 
> [https://github.com/dionusos/pool_exhausted_repro] . See README.md for 
> detailed repro steps.
> h1. Kind of solution?
> To be honest I am not really familiar with DBCP but with this change I 
> managed to make my application more robust:
> {code:java}
> diff --git a/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java 
> b/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> index 440cb756..678550bf 100644
> --- a/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> +++ b/src/main/java/org/apache/commons/dbcp2/PoolableConnection.java
> @@ -214,6 +214,10 @@ public class PoolableConnection extends 
> DelegatingConnection<Connection> impleme
>      @Override
>      protected void handleException(final SQLException e) throws SQLException 
> {
>          fatalSqlExceptionThrown |= isFatalException(e);
> +        if (fatalSqlExceptionThrown && getDelegate() != null) {
> +            getDelegate().close();
> +            this.close();
> +        }
>          super.handleException(e);
>      }{code}
> What do you think about this approach?
> Is it a completely dead-end or we can start working on it in this direction?
> Do you agree that the reported and reproduced issue is a real one and nut 
> just some kind of misconfiguration?
>  
> I am lost at this point and I need to move forward so I am asking for 
> guidance here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to