Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Phil Steitz
On 1/9/18 1:56 PM, Phil Steitz wrote:
> On 1/9/18 11:50 AM, Phil Steitz wrote:
>> On 1/8/18 4:23 PM, Shawn Heisey wrote:
>>> On 11/22/2017 5:00 PM, Phil Steitz wrote:
 If the problem is the evictor closing a connection and having that
 connection delivered to a client, the problem is almost certainly in
 pool.  The thread-safety of the pool in this regard is engineered in
 DefaultPooledObject, which is the wrapper that pool manages and
 delivers to DBCP.  When the evictor visits a PooledObject (in
 GenericObjectPool#evict) it tries to start the eviction test on the
 object by calling its startEvictionTest method.  This method is
 synchronized on the DefaultPooledObject.  Look at the code in that
 method.  It checks to make sure that the object is in fact idle in
 the pool.  The other half of the protection here is in
 GenericObjectPool#borrowObject, which is what PoolingDataSource
 calls to get a connection.  That method tries to get a PooledObject
 from the pool and before handing it out (or validating it), it calls
 the PooledObject's allocate method.  Look at the code for that in
 DefaultPooledObject.  That method (also synchronized on the
 PooledObject) checks that the object is not under eviction and sets
 its state to allocated.  That is the core sync protection that
 *should* make it impossible for the evictor to do anything to an
 object that has been handed out to a client.
>>> I see the synchronization you're talking about here.  It appears that
>>> all of the critical methods in DefaultPooledObject are synchronized (on
>>> the object).
>>>
>>> If you're absolutely certain that DefaultPooledObject is involved with
>>> all of the implementation my code is using, it all looks pretty complete
>>> to me. 
>> Yes, the code you posted at the top of the thread uses a
>> PoolableConnectionFactory as the object factory for the pool.  You
>> can see that PCF's makeObject returns a DefaultPooledObject, so that
>> much is certain.
>>>  So I'm really curious as to why the connection is getting
>>> closed.  I have seen the problem only minutes after restarting my
>>> program, so it seems unlikely that the server side is closing the
>>> connection, since the timeout for that is 8 hours.
>> I looked back at the initial stack trace and I noticed something
>> that I had not noticed before.
>>
>> This line
>>
>> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>>
>> means that checkOpen() succeeded.  That, combined with your
>> statement above that isClosed() returns true on a failed connection
>> means that there might be concurrent access to the
>> DelegatingConnections happening.  It looks like the sequence might
>> have been:
>>
>> thread 1: checkOpen - sees true
>> thread 2: close the DelegatingConnection  (there is no sync to
>> prevent this)
>> thread1 : createStatement - bang!
>> thread1 : isClosed() returns true
>>
>> DBCP is not really safe to use that way - i.e., really the intended
>> setup is that individual connection handles are not concurrently
>> accessed by multiple threads.  Is it possible something like this is
>> going on?  Note that what I am talking about here is two different
>> threads holding references to the same connection handle - i.e., no
>> trips back through the pool.
> I just noticed another thing in [pool] that might have something to
> do with this.  It's probably best to investigate what I have in mind
> on the dev list.  I will post a summary / ticket reference here if
> it turns out I this is a bug.

Sorry for the noise.  Bug idea evaporated when I dug into it.

Phil

>
> Phil
>> Phil
>>> I did add the code a while back to test on create, borrow, return, and
>>> while idle, but it turns out that I hadn't actually pulled it down to
>>> the test server and recompiled.  That is now done, so we'll see if that
>>> makes any difference.
>>>
>>> If testing the connection on pool actions does make a difference, then
>>> what is your speculation about what was happening when I ran into the
>>> closed connection only minutes after restart, and would it be worthy of
>>> an issue in Jira?  The only theory I had was a race condition between
>>> eviction and borrowing, but unless there's something amiss in how all
>>> the object inheritance works out, it looks like that's probably not it. 
>>> Some kind of issue with the TCP stack in Linux (either on the machines
>>> running my code or the MySQL server) is the only other idea I can think
>>> of.  Or maybe a hardware/firmware issue, since it's likely that at least
>>> one of the NICs involved is doing TCP offload.  I think that virtually
>>> every NIC in our infrastructure has that feature and that Linux enables it.
>>> Thanks,
>>> Shawn
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: 

Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Phil Steitz
On 1/9/18 11:50 AM, Phil Steitz wrote:
> On 1/8/18 4:23 PM, Shawn Heisey wrote:
>> On 11/22/2017 5:00 PM, Phil Steitz wrote:
>>> If the problem is the evictor closing a connection and having that
>>> connection delivered to a client, the problem is almost certainly in
>>> pool.  The thread-safety of the pool in this regard is engineered in
>>> DefaultPooledObject, which is the wrapper that pool manages and
>>> delivers to DBCP.  When the evictor visits a PooledObject (in
>>> GenericObjectPool#evict) it tries to start the eviction test on the
>>> object by calling its startEvictionTest method.  This method is
>>> synchronized on the DefaultPooledObject.  Look at the code in that
>>> method.  It checks to make sure that the object is in fact idle in
>>> the pool.  The other half of the protection here is in
>>> GenericObjectPool#borrowObject, which is what PoolingDataSource
>>> calls to get a connection.  That method tries to get a PooledObject
>>> from the pool and before handing it out (or validating it), it calls
>>> the PooledObject's allocate method.  Look at the code for that in
>>> DefaultPooledObject.  That method (also synchronized on the
>>> PooledObject) checks that the object is not under eviction and sets
>>> its state to allocated.  That is the core sync protection that
>>> *should* make it impossible for the evictor to do anything to an
>>> object that has been handed out to a client.
>> I see the synchronization you're talking about here.  It appears that
>> all of the critical methods in DefaultPooledObject are synchronized (on
>> the object).
>>
>> If you're absolutely certain that DefaultPooledObject is involved with
>> all of the implementation my code is using, it all looks pretty complete
>> to me. 
> Yes, the code you posted at the top of the thread uses a
> PoolableConnectionFactory as the object factory for the pool.  You
> can see that PCF's makeObject returns a DefaultPooledObject, so that
> much is certain.
>>  So I'm really curious as to why the connection is getting
>> closed.  I have seen the problem only minutes after restarting my
>> program, so it seems unlikely that the server side is closing the
>> connection, since the timeout for that is 8 hours.
> I looked back at the initial stack trace and I noticed something
> that I had not noticed before.
>
> This line
>
> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>
> means that checkOpen() succeeded.  That, combined with your
> statement above that isClosed() returns true on a failed connection
> means that there might be concurrent access to the
> DelegatingConnections happening.  It looks like the sequence might
> have been:
>
> thread 1: checkOpen - sees true
> thread 2: close the DelegatingConnection  (there is no sync to
> prevent this)
> thread1 : createStatement - bang!
> thread1 : isClosed() returns true
>
> DBCP is not really safe to use that way - i.e., really the intended
> setup is that individual connection handles are not concurrently
> accessed by multiple threads.  Is it possible something like this is
> going on?  Note that what I am talking about here is two different
> threads holding references to the same connection handle - i.e., no
> trips back through the pool.

I just noticed another thing in [pool] that might have something to
do with this.  It's probably best to investigate what I have in mind
on the dev list.  I will post a summary / ticket reference here if
it turns out I this is a bug.

Phil
>
> Phil
>> I did add the code a while back to test on create, borrow, return, and
>> while idle, but it turns out that I hadn't actually pulled it down to
>> the test server and recompiled.  That is now done, so we'll see if that
>> makes any difference.
>>
>> If testing the connection on pool actions does make a difference, then
>> what is your speculation about what was happening when I ran into the
>> closed connection only minutes after restart, and would it be worthy of
>> an issue in Jira?  The only theory I had was a race condition between
>> eviction and borrowing, but unless there's something amiss in how all
>> the object inheritance works out, it looks like that's probably not it. 
>> Some kind of issue with the TCP stack in Linux (either on the machines
>> running my code or the MySQL server) is the only other idea I can think
>> of.  Or maybe a hardware/firmware issue, since it's likely that at least
>> one of the NICs involved is doing TCP offload.  I think that virtually
>> every NIC in our infrastructure has that feature and that Linux enables it.
>
>> Thanks,
>> Shawn
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
>> For additional commands, e-mail: user-h...@commons.apache.org
>>
>>


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: 

Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Phil Steitz
On 1/8/18 4:23 PM, Shawn Heisey wrote:
> On 11/22/2017 5:00 PM, Phil Steitz wrote:
>> If the problem is the evictor closing a connection and having that
>> connection delivered to a client, the problem is almost certainly in
>> pool.  The thread-safety of the pool in this regard is engineered in
>> DefaultPooledObject, which is the wrapper that pool manages and
>> delivers to DBCP.  When the evictor visits a PooledObject (in
>> GenericObjectPool#evict) it tries to start the eviction test on the
>> object by calling its startEvictionTest method.  This method is
>> synchronized on the DefaultPooledObject.  Look at the code in that
>> method.  It checks to make sure that the object is in fact idle in
>> the pool.  The other half of the protection here is in
>> GenericObjectPool#borrowObject, which is what PoolingDataSource
>> calls to get a connection.  That method tries to get a PooledObject
>> from the pool and before handing it out (or validating it), it calls
>> the PooledObject's allocate method.  Look at the code for that in
>> DefaultPooledObject.  That method (also synchronized on the
>> PooledObject) checks that the object is not under eviction and sets
>> its state to allocated.  That is the core sync protection that
>> *should* make it impossible for the evictor to do anything to an
>> object that has been handed out to a client.
> I see the synchronization you're talking about here.  It appears that
> all of the critical methods in DefaultPooledObject are synchronized (on
> the object).
>
> If you're absolutely certain that DefaultPooledObject is involved with
> all of the implementation my code is using, it all looks pretty complete
> to me. 

Yes, the code you posted at the top of the thread uses a
PoolableConnectionFactory as the object factory for the pool.  You
can see that PCF's makeObject returns a DefaultPooledObject, so that
much is certain.
>  So I'm really curious as to why the connection is getting
> closed.  I have seen the problem only minutes after restarting my
> program, so it seems unlikely that the server side is closing the
> connection, since the timeout for that is 8 hours.

I looked back at the initial stack trace and I noticed something
that I had not noticed before.

This line

org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)

means that checkOpen() succeeded.  That, combined with your
statement above that isClosed() returns true on a failed connection
means that there might be concurrent access to the
DelegatingConnections happening.  It looks like the sequence might
have been:

thread 1: checkOpen - sees true
thread 2: close the DelegatingConnection  (there is no sync to
prevent this)
thread1 : createStatement - bang!
thread1 : isClosed() returns true

DBCP is not really safe to use that way - i.e., really the intended
setup is that individual connection handles are not concurrently
accessed by multiple threads.  Is it possible something like this is
going on?  Note that what I am talking about here is two different
threads holding references to the same connection handle - i.e., no
trips back through the pool.

Phil
>
> I did add the code a while back to test on create, borrow, return, and
> while idle, but it turns out that I hadn't actually pulled it down to
> the test server and recompiled.  That is now done, so we'll see if that
> makes any difference.
>
> If testing the connection on pool actions does make a difference, then
> what is your speculation about what was happening when I ran into the
> closed connection only minutes after restart, and would it be worthy of
> an issue in Jira?  The only theory I had was a race condition between
> eviction and borrowing, but unless there's something amiss in how all
> the object inheritance works out, it looks like that's probably not it. 
> Some kind of issue with the TCP stack in Linux (either on the machines
> running my code or the MySQL server) is the only other idea I can think
> of.  Or maybe a hardware/firmware issue, since it's likely that at least
> one of the NICs involved is doing TCP offload.  I think that virtually
> every NIC in our infrastructure has that feature and that Linux enables it.


>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Shawn Heisey

On 1/9/2018 8:31 AM, Gary Gregory wrote:

Currious: are you using the current versions of DBCP and Pool?


The last compile pulled down pool 2.5.0, dbcp 2.2.0, and mysql driver 
5.1.45 (using ivy).  I haven't seen the problem since that compile 
(which also enabled testing connections on various pool actions like 
create, borrow, etc), but it can take a while before it occurs, so I'm 
not calling it solved yet.


When I started this thread on November 15th, I was running pool 2.4.3, 
dbcp 2.1.1, and whatever version of the mysql driver was current at the 
time.  Those were the latest pool and dbcp versions at that time.


Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Gary Gregory
And of your JDBC driver?

Gary

On Jan 9, 2018 8:31 AM, "Gary Gregory"  wrote:

> Currious: are you using the current versions of DBCP and Pool?
>
> Gary
>
> On Jan 9, 2018 8:28 AM, "Shawn Heisey"  wrote:
>
>> On 1/8/2018 11:19 PM, Chas Honton wrote:
>> > Don’t forget about any proxies or routers between the client and the
>> server. Most infrastructure components will close quiescent connections
>> (within minutes)
>>
>> There are no proxies for the MySQL traffic.  The machines involved are
>> on the same subnet (vlan), so it's only layer 2 switches made by Cisco
>> that provide the networking -- no routing.  There is no QoS configured.
>> The connections are gigabit and the switchports are not incrementing
>> error counters.  The inter-switch connections are trunk ports and also
>> show no incrementing error counters.
>>
>> Cisco switches tend to have enough backplane bandwidth to run every port
>> at full line rate in both directions.  We don't run our infrastructure
>> anywhere near full switching capacity, so dropped packets are unlikely,
>> but even if that did happen, TCP is designed to handle that
>> transparently to applications.
>>
>> Cisco routers and switches do not typically interfere in TCP
>> communication.  Their firewalls do get involved in TCP, but even if this
>> traffic was passing through our firewall (which it's not), our TCP idle
>> timeout on the firewall is set to two hours, to accommodate long-lived
>> TCP connections over the Internet.
>>
>> Thanks,
>> Shawn
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
>> For additional commands, e-mail: user-h...@commons.apache.org
>>
>>


Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Gary Gregory
Currious: are you using the current versions of DBCP and Pool?

Gary

On Jan 9, 2018 8:28 AM, "Shawn Heisey"  wrote:

> On 1/8/2018 11:19 PM, Chas Honton wrote:
> > Don’t forget about any proxies or routers between the client and the
> server. Most infrastructure components will close quiescent connections
> (within minutes)
>
> There are no proxies for the MySQL traffic.  The machines involved are
> on the same subnet (vlan), so it's only layer 2 switches made by Cisco
> that provide the networking -- no routing.  There is no QoS configured.
> The connections are gigabit and the switchports are not incrementing
> error counters.  The inter-switch connections are trunk ports and also
> show no incrementing error counters.
>
> Cisco switches tend to have enough backplane bandwidth to run every port
> at full line rate in both directions.  We don't run our infrastructure
> anywhere near full switching capacity, so dropped packets are unlikely,
> but even if that did happen, TCP is designed to handle that
> transparently to applications.
>
> Cisco routers and switches do not typically interfere in TCP
> communication.  Their firewalls do get involved in TCP, but even if this
> traffic was passing through our firewall (which it's not), our TCP idle
> timeout on the firewall is set to two hours, to accommodate long-lived
> TCP connections over the Internet.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


Re: [DBCP] Connection just obtained from datasource is invalid

2018-01-09 Thread Shawn Heisey
On 1/8/2018 11:19 PM, Chas Honton wrote:
> Don’t forget about any proxies or routers between the client and the server. 
> Most infrastructure components will close quiescent connections (within 
> minutes)

There are no proxies for the MySQL traffic.  The machines involved are
on the same subnet (vlan), so it's only layer 2 switches made by Cisco
that provide the networking -- no routing.  There is no QoS configured. 
The connections are gigabit and the switchports are not incrementing
error counters.  The inter-switch connections are trunk ports and also
show no incrementing error counters.

Cisco switches tend to have enough backplane bandwidth to run every port
at full line rate in both directions.  We don't run our infrastructure
anywhere near full switching capacity, so dropped packets are unlikely,
but even if that did happen, TCP is designed to handle that
transparently to applications.

Cisco routers and switches do not typically interfere in TCP
communication.  Their firewalls do get involved in TCP, but even if this
traffic was passing through our firewall (which it's not), our TCP idle
timeout on the firewall is set to two hours, to accommodate long-lived
TCP connections over the Internet.

Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org