Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Erwin Hogeweg
All -

I am following this thread with some interest because many moons ago I had a 
weird problem after upgrading from pax-jdbc-0.9.0 to pax-jdbc-1.1.0. 
Intermittently the DB connection pool would get into a crazy loop and continued 
to spin off new OS threads until the OS (Windows Server in this case) ran out 
of threads/handles, at which point the whole application crashed and burned.

I was never able to reliably reproduce this, and it only seemed to happen on 
Windows OS when many new connections are opened ’simultaneous'. In the end we 
rolled back to 0.9.0 and life is good ever since.

DB server is mySQL-5.7
Java8

I have NO idea if this is related or not, but it might… so I figure I share the 
info.


Cheers,

Erwin


On Nov 22, 2017, at 19:00, Phil Steitz 
> wrote:

On 11/22/17 2:43 PM, Shawn Heisey wrote:
On 11/22/2017 12:04 PM, Phil Steitz wrote:
On 11/22/17 9:43 AM, Shawn Heisey wrote:
I do have results from the isClosed method when the problem happens.
That method *does* return true.
That points to a Pool or DBCP bug, assuming you are sure that no
other thread has a reference to the PoolableConnection or some other
code path did not call close on it before you tested isClosed.  If
you are sure this is not happening, you should open a DBCP JIRA
(which may end up reassigned to pool).  Ideal would be to have a
test case that makes it happen.
I would absolutely love to have a test case that can reproduce it, but
since I haven't got any idea what the root of the problem is, I wouldn't
know how to write such a test case.

What I'd really like to do is be able to look over dbcp2 and pool2 code
to see if I can spot a problem, but I'm having a hard time following the
code.  I expected to find some kind of synchronization in the code
branching from getConnection() to prevent different threads from being
able to step on each other, but haven't seen any so far.  I can't tell
if this means I'm looking in the wrong place or not.  The object
inheritance is pretty extensive, so I'm having a hard time finding the
right place to look.

If it turns out that there is zero synchronization happening between the
idle eviction thread and the depths of the code for things like
getConnection, then I don't see how any kind of guarantee can be made.
So far the synchronization object in the eviction thread only seems to
pair with other parts of eviction, NOT with anything else in the library.

If the testXXX flags I've enabled do eliminate the problems I'm seeing,
that's awesome, and it's *A* solution, but I think I'm still running
into some kind of issue that needs to be fixed.  I just need to figure
out whether it's dbcp2, pool2, or something in my own environment.  I'm
willing to entertain the idea that it's my environment, but based on
everything that I understand about my own code and our database servers
(and I fully admit it's circumstantial evidence), it points to a problem
with the idle eviction thread.

I believe you when you say that the *intent* is for idle eviction to
never close/evict a connection that's been requested from the pool.  I
would like to verify whether the intent and what's actually implemented
are the same.  If they're not the same, then I would like to attempt a
patch.  I'm going to need help in figuring out exactly where I should be
looking in the code for dbcp2 and pool2.

If the problem is the evictor closing a connection and having that
connection delivered to a client, the problem is almost certainly in
pool.  The thread-safety of the pool in this regard is engineered in
DefaultPooledObject, which is the wrapper that pool manages and
delivers to DBCP.  When the evictor visits a PooledObject (in
GenericObjectPool#evict) it tries to start the eviction test on the
object by calling its startEvictionTest method.  This method is
synchronized on the DefaultPooledObject.  Look at the code in that
method.  It checks to make sure that the object is in fact idle in
the pool.  The other half of the protection here is in
GenericObjectPool#borrowObject, which is what PoolingDataSource
calls to get a connection.  That method tries to get a PooledObject
from the pool and before handing it out (or validating it), it calls
the PooledObject's allocate method.  Look at the code for that in
DefaultPooledObject.  That method (also synchronized on the
PooledObject) checks that the object is not under eviction and sets
its state to allocated.  That is the core sync protection that
*should* make it impossible for the evictor to do anything to an
object that has been handed out to a client.

The logical place to start to get a test case that shows this
protection failing is to just set up a pool with very aggressive
eviction config (very small idle object timeout), frequent eviction
runs and a lot of concurrent borrowing.  Make sure the factory's
destroy method does something to simulate what PCF does to mark the
object as dead and see if you get any corpses 

Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Phil Steitz
On 11/22/17 2:43 PM, Shawn Heisey wrote:
> On 11/22/2017 12:04 PM, Phil Steitz wrote:
>> On 11/22/17 9:43 AM, Shawn Heisey wrote:
>>> I do have results from the isClosed method when the problem happens. 
>>> That method *does* return true. 
>> That points to a Pool or DBCP bug, assuming you are sure that no
>> other thread has a reference to the PoolableConnection or some other
>> code path did not call close on it before you tested isClosed.  If
>> you are sure this is not happening, you should open a DBCP JIRA
>> (which may end up reassigned to pool).  Ideal would be to have a
>> test case that makes it happen.
> I would absolutely love to have a test case that can reproduce it, but
> since I haven't got any idea what the root of the problem is, I wouldn't
> know how to write such a test case.
>
> What I'd really like to do is be able to look over dbcp2 and pool2 code
> to see if I can spot a problem, but I'm having a hard time following the
> code.  I expected to find some kind of synchronization in the code
> branching from getConnection() to prevent different threads from being
> able to step on each other, but haven't seen any so far.  I can't tell
> if this means I'm looking in the wrong place or not.  The object
> inheritance is pretty extensive, so I'm having a hard time finding the
> right place to look.
>
> If it turns out that there is zero synchronization happening between the
> idle eviction thread and the depths of the code for things like
> getConnection, then I don't see how any kind of guarantee can be made. 
> So far the synchronization object in the eviction thread only seems to
> pair with other parts of eviction, NOT with anything else in the library.
>
> If the testXXX flags I've enabled do eliminate the problems I'm seeing,
> that's awesome, and it's *A* solution, but I think I'm still running
> into some kind of issue that needs to be fixed.  I just need to figure
> out whether it's dbcp2, pool2, or something in my own environment.  I'm
> willing to entertain the idea that it's my environment, but based on
> everything that I understand about my own code and our database servers
> (and I fully admit it's circumstantial evidence), it points to a problem
> with the idle eviction thread.
>
> I believe you when you say that the *intent* is for idle eviction to
> never close/evict a connection that's been requested from the pool.  I
> would like to verify whether the intent and what's actually implemented
> are the same.  If they're not the same, then I would like to attempt a
> patch.  I'm going to need help in figuring out exactly where I should be
> looking in the code for dbcp2 and pool2.

If the problem is the evictor closing a connection and having that
connection delivered to a client, the problem is almost certainly in
pool.  The thread-safety of the pool in this regard is engineered in
DefaultPooledObject, which is the wrapper that pool manages and
delivers to DBCP.  When the evictor visits a PooledObject (in
GenericObjectPool#evict) it tries to start the eviction test on the
object by calling its startEvictionTest method.  This method is
synchronized on the DefaultPooledObject.  Look at the code in that
method.  It checks to make sure that the object is in fact idle in
the pool.  The other half of the protection here is in
GenericObjectPool#borrowObject, which is what PoolingDataSource
calls to get a connection.  That method tries to get a PooledObject
from the pool and before handing it out (or validating it), it calls
the PooledObject's allocate method.  Look at the code for that in
DefaultPooledObject.  That method (also synchronized on the
PooledObject) checks that the object is not under eviction and sets
its state to allocated.  That is the core sync protection that
*should* make it impossible for the evictor to do anything to an
object that has been handed out to a client.

The logical place to start to get a test case that shows this
protection failing is to just set up a pool with very aggressive
eviction config (very small idle object timeout), frequent eviction
runs and a lot of concurrent borrowing.  Make sure the factory's
destroy method does something to simulate what PCF does to mark the
object as dead and see if you get any corpses handed out to
borrowers.  Also make sure that there are enough idle instances in
the pool for the evictor to visit.  For that, you probably want to
vary the borrowing load.  You can set up jmx to observe the pool
stats to see how many are idle at a given time or just log it using
the getNumIdle.  A quick look at the existing pool2 test cases does
not show exactly that scenario covered, so it would be good to add
in any case.

Phil


>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


-
To 

Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Shawn Heisey
On 11/22/2017 12:04 PM, Phil Steitz wrote:
> On 11/22/17 9:43 AM, Shawn Heisey wrote:
>> I do have results from the isClosed method when the problem happens. 
>> That method *does* return true. 
> That points to a Pool or DBCP bug, assuming you are sure that no
> other thread has a reference to the PoolableConnection or some other
> code path did not call close on it before you tested isClosed.  If
> you are sure this is not happening, you should open a DBCP JIRA
> (which may end up reassigned to pool).  Ideal would be to have a
> test case that makes it happen.

I would absolutely love to have a test case that can reproduce it, but
since I haven't got any idea what the root of the problem is, I wouldn't
know how to write such a test case.

What I'd really like to do is be able to look over dbcp2 and pool2 code
to see if I can spot a problem, but I'm having a hard time following the
code.  I expected to find some kind of synchronization in the code
branching from getConnection() to prevent different threads from being
able to step on each other, but haven't seen any so far.  I can't tell
if this means I'm looking in the wrong place or not.  The object
inheritance is pretty extensive, so I'm having a hard time finding the
right place to look.

If it turns out that there is zero synchronization happening between the
idle eviction thread and the depths of the code for things like
getConnection, then I don't see how any kind of guarantee can be made. 
So far the synchronization object in the eviction thread only seems to
pair with other parts of eviction, NOT with anything else in the library.

If the testXXX flags I've enabled do eliminate the problems I'm seeing,
that's awesome, and it's *A* solution, but I think I'm still running
into some kind of issue that needs to be fixed.  I just need to figure
out whether it's dbcp2, pool2, or something in my own environment.  I'm
willing to entertain the idea that it's my environment, but based on
everything that I understand about my own code and our database servers
(and I fully admit it's circumstantial evidence), it points to a problem
with the idle eviction thread.

I believe you when you say that the *intent* is for idle eviction to
never close/evict a connection that's been requested from the pool.  I
would like to verify whether the intent and what's actually implemented
are the same.  If they're not the same, then I would like to attempt a
patch.  I'm going to need help in figuring out exactly where I should be
looking in the code for dbcp2 and pool2.

Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Shawn Heisey
On 11/22/2017 1:56 PM, Phil Steitz wrote:
> You're not storing the connection somewhere in a possibly not
> thread-safe way, are you?

Not as far as I can tell.  Connections are mostly obtained within my
"Database" class, which has static methods that get called from
elsewhere in my code.  The connection is obtained, used to do some work,
and closed, all within single methods that are very short and do not
spawn threads.  In the two other places where I obtain a connection, the
code is also very well contained.  Connections are never passed between
threads.

The review to verify how I'm obtaining and using connections was not as
time-consuming as I thought it would be.  I can now be certain that
every connection that is obtained *is* being used.

Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Phil Steitz
On 11/22/17 9:43 AM, Shawn Heisey wrote:
> On 11/21/2017 1:01 PM, Phil Steitz wrote:
>> As I said in my first response, the most common explanation for this
>> kind of exception when using DBCP is that the underlying physical
>> connection is closed on the server (or network) side without DBCP
>> knowing about it.  And the most common cause of that is server-side
>> idle connection timeout.
> I do have results from the isClosed method when the problem happens. 
> That method *does* return true.
>
> I am pretty sure that the server-side idle timeout is at MySQL's default
> of 28800 seconds, or 8 hours.  The problem I am experiencing has
> happened only a few minutes after starting my program, so in that case,
> I would not expect a server-side timeout to happen.
>
>> It just occurred to me that since you do not have any of the testXxx
>> flags set to true, DBCP is never actually exercising the
>> connections.
> I have now enabled the flags for testing on create, borrow, return, and
> while idle.  If this does eliminate problems, do you think my theory
> might be correct, or that it's probably something else?  The
> testWhileIdle flag does seem a little excessive, but I figured I would
> go ahead with it.  The "SELECT 1" validation query should always be fast
> as long as everything's working right ... and if it's not working right,
> I am not going to mind a few seconds passing before the problem is reported.
>
>> Is it possible that sometimes your code checks out a
>> connection from the pool but does not use it?
> I'm only calling the datasource getConnection in a getConnection method
> of my own, and that method is only used in the methods which perform
> work on a connection.

You're not storing the connection somewhere in a possibly not
thread-safe way, are you? 

Phil
>   It seems unlikely that I am checking out a
> connection and never using it, but to be certain I would need to do a
> full review of the code in my Database class.  I should probably do that
> anyway, but it's a fair amount of work, so I've been avoiding it.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Phil Steitz
On 11/22/17 9:43 AM, Shawn Heisey wrote:
> On 11/21/2017 1:01 PM, Phil Steitz wrote:
>> As I said in my first response, the most common explanation for this
>> kind of exception when using DBCP is that the underlying physical
>> connection is closed on the server (or network) side without DBCP
>> knowing about it.  And the most common cause of that is server-side
>> idle connection timeout.
> I do have results from the isClosed method when the problem happens. 
> That method *does* return true.

That points to a Pool or DBCP bug, assuming you are sure that no
other thread has a reference to the PoolableConnection or some other
code path did not call close on it before you tested isClosed.  If
you are sure this is not happening, you should open a DBCP JIRA
(which may end up reassigned to pool).  Ideal would be to have a
test case that makes it happen.
>
> I am pretty sure that the server-side idle timeout is at MySQL's default
> of 28800 seconds, or 8 hours.  The problem I am experiencing has
> happened only a few minutes after starting my program, so in that case,
> I would not expect a server-side timeout to happen.

Makes sense, so can rule that out.
>
>> It just occurred to me that since you do not have any of the testXxx
>> flags set to true, DBCP is never actually exercising the
>> connections.
> I have now enabled the flags for testing on create, borrow, return, and
> while idle.  If this does eliminate problems, do you think my theory
> might be correct, or that it's probably something else? 

Your theory that it is a pool bug?  If adding those tests makes the
problem go away, that might just be due to timing.

>  The
> testWhileIdle flag does seem a little excessive, but I figured I would
> go ahead with it.  The "SELECT 1" validation query should always be fast
> as long as everything's working right ... and if it's not working right,
> I am not going to mind a few seconds passing before the problem is reported.
>
>> Is it possible that sometimes your code checks out a
>> connection from the pool but does not use it?
> I'm only calling the datasource getConnection in a getConnection method
> of my own, and that method is only used in the methods which perform
> work on a connection.  It seems unlikely that I am checking out a
> connection and never using it, but to be certain I would need to do a
> full review of the code in my Database class.  I should probably do that
> anyway, but it's a fair amount of work, so I've been avoiding it.

Code review is always great ;) but given that the problem happens
close to startup, it is not likely that the connections are being
closed server-side.  The fact that isClosed is returning true when
the problem happens also rules that out - it means that DBCP knows
that the connection is closed. That doesn't happen when server-side
timeouts happen.

>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-22 Thread Shawn Heisey
On 11/21/2017 1:01 PM, Phil Steitz wrote:
> As I said in my first response, the most common explanation for this
> kind of exception when using DBCP is that the underlying physical
> connection is closed on the server (or network) side without DBCP
> knowing about it.  And the most common cause of that is server-side
> idle connection timeout.

I do have results from the isClosed method when the problem happens. 
That method *does* return true.

I am pretty sure that the server-side idle timeout is at MySQL's default
of 28800 seconds, or 8 hours.  The problem I am experiencing has
happened only a few minutes after starting my program, so in that case,
I would not expect a server-side timeout to happen.

> It just occurred to me that since you do not have any of the testXxx
> flags set to true, DBCP is never actually exercising the
> connections.

I have now enabled the flags for testing on create, borrow, return, and
while idle.  If this does eliminate problems, do you think my theory
might be correct, or that it's probably something else?  The
testWhileIdle flag does seem a little excessive, but I figured I would
go ahead with it.  The "SELECT 1" validation query should always be fast
as long as everything's working right ... and if it's not working right,
I am not going to mind a few seconds passing before the problem is reported.

> Is it possible that sometimes your code checks out a
> connection from the pool but does not use it?

I'm only calling the datasource getConnection in a getConnection method
of my own, and that method is only used in the methods which perform
work on a connection.  It seems unlikely that I am checking out a
connection and never using it, but to be certain I would need to do a
full review of the code in my Database class.  I should probably do that
anyway, but it's a fair amount of work, so I've been avoiding it.

Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org