Re: [DBCP] Connection just obtained from datasource is invalid
On 1/9/18 1:56 PM, Phil Steitz wrote: > On 1/9/18 11:50 AM, Phil Steitz wrote: >> On 1/8/18 4:23 PM, Shawn Heisey wrote: >>> On 11/22/2017 5:00 PM, Phil Steitz wrote: If the problem is the evictor closing a connection and having that connection delivered to a client, the problem is almost certainly in pool. The thread-safety of the pool in this regard is engineered in DefaultPooledObject, which is the wrapper that pool manages and delivers to DBCP. When the evictor visits a PooledObject (in GenericObjectPool#evict) it tries to start the eviction test on the object by calling its startEvictionTest method. This method is synchronized on the DefaultPooledObject. Look at the code in that method. It checks to make sure that the object is in fact idle in the pool. The other half of the protection here is in GenericObjectPool#borrowObject, which is what PoolingDataSource calls to get a connection. That method tries to get a PooledObject from the pool and before handing it out (or validating it), it calls the PooledObject's allocate method. Look at the code for that in DefaultPooledObject. That method (also synchronized on the PooledObject) checks that the object is not under eviction and sets its state to allocated. That is the core sync protection that *should* make it impossible for the evictor to do anything to an object that has been handed out to a client. >>> I see the synchronization you're talking about here. It appears that >>> all of the critical methods in DefaultPooledObject are synchronized (on >>> the object). >>> >>> If you're absolutely certain that DefaultPooledObject is involved with >>> all of the implementation my code is using, it all looks pretty complete >>> to me. >> Yes, the code you posted at the top of the thread uses a >> PoolableConnectionFactory as the object factory for the pool. You >> can see that PCF's makeObject returns a DefaultPooledObject, so that >> much is certain. >>> So I'm really curious as to why the connection is getting >>> closed. I have seen the problem only minutes after restarting my >>> program, so it seems unlikely that the server side is closing the >>> connection, since the timeout for that is 8 hours. >> I looked back at the initial stack trace and I noticed something >> that I had not noticed before. >> >> This line >> >> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262) >> >> means that checkOpen() succeeded. That, combined with your >> statement above that isClosed() returns true on a failed connection >> means that there might be concurrent access to the >> DelegatingConnections happening. It looks like the sequence might >> have been: >> >> thread 1: checkOpen - sees true >> thread 2: close the DelegatingConnection (there is no sync to >> prevent this) >> thread1 : createStatement - bang! >> thread1 : isClosed() returns true >> >> DBCP is not really safe to use that way - i.e., really the intended >> setup is that individual connection handles are not concurrently >> accessed by multiple threads. Is it possible something like this is >> going on? Note that what I am talking about here is two different >> threads holding references to the same connection handle - i.e., no >> trips back through the pool. > I just noticed another thing in [pool] that might have something to > do with this. It's probably best to investigate what I have in mind > on the dev list. I will post a summary / ticket reference here if > it turns out I this is a bug. Sorry for the noise. Bug idea evaporated when I dug into it. Phil > > Phil >> Phil >>> I did add the code a while back to test on create, borrow, return, and >>> while idle, but it turns out that I hadn't actually pulled it down to >>> the test server and recompiled. That is now done, so we'll see if that >>> makes any difference. >>> >>> If testing the connection on pool actions does make a difference, then >>> what is your speculation about what was happening when I ran into the >>> closed connection only minutes after restart, and would it be worthy of >>> an issue in Jira? The only theory I had was a race condition between >>> eviction and borrowing, but unless there's something amiss in how all >>> the object inheritance works out, it looks like that's probably not it. >>> Some kind of issue with the TCP stack in Linux (either on the machines >>> running my code or the MySQL server) is the only other idea I can think >>> of. Or maybe a hardware/firmware issue, since it's likely that at least >>> one of the NICs involved is doing TCP offload. I think that virtually >>> every NIC in our infrastructure has that feature and that Linux enables it. >>> Thanks, >>> Shawn >>> >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org >>> For additional commands, e-mail:
Re: [DBCP] Connection just obtained from datasource is invalid
On 1/9/18 11:50 AM, Phil Steitz wrote: > On 1/8/18 4:23 PM, Shawn Heisey wrote: >> On 11/22/2017 5:00 PM, Phil Steitz wrote: >>> If the problem is the evictor closing a connection and having that >>> connection delivered to a client, the problem is almost certainly in >>> pool. The thread-safety of the pool in this regard is engineered in >>> DefaultPooledObject, which is the wrapper that pool manages and >>> delivers to DBCP. When the evictor visits a PooledObject (in >>> GenericObjectPool#evict) it tries to start the eviction test on the >>> object by calling its startEvictionTest method. This method is >>> synchronized on the DefaultPooledObject. Look at the code in that >>> method. It checks to make sure that the object is in fact idle in >>> the pool. The other half of the protection here is in >>> GenericObjectPool#borrowObject, which is what PoolingDataSource >>> calls to get a connection. That method tries to get a PooledObject >>> from the pool and before handing it out (or validating it), it calls >>> the PooledObject's allocate method. Look at the code for that in >>> DefaultPooledObject. That method (also synchronized on the >>> PooledObject) checks that the object is not under eviction and sets >>> its state to allocated. That is the core sync protection that >>> *should* make it impossible for the evictor to do anything to an >>> object that has been handed out to a client. >> I see the synchronization you're talking about here. It appears that >> all of the critical methods in DefaultPooledObject are synchronized (on >> the object). >> >> If you're absolutely certain that DefaultPooledObject is involved with >> all of the implementation my code is using, it all looks pretty complete >> to me. > Yes, the code you posted at the top of the thread uses a > PoolableConnectionFactory as the object factory for the pool. You > can see that PCF's makeObject returns a DefaultPooledObject, so that > much is certain. >> So I'm really curious as to why the connection is getting >> closed. I have seen the problem only minutes after restarting my >> program, so it seems unlikely that the server side is closing the >> connection, since the timeout for that is 8 hours. > I looked back at the initial stack trace and I noticed something > that I had not noticed before. > > This line > > org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262) > > means that checkOpen() succeeded. That, combined with your > statement above that isClosed() returns true on a failed connection > means that there might be concurrent access to the > DelegatingConnections happening. It looks like the sequence might > have been: > > thread 1: checkOpen - sees true > thread 2: close the DelegatingConnection (there is no sync to > prevent this) > thread1 : createStatement - bang! > thread1 : isClosed() returns true > > DBCP is not really safe to use that way - i.e., really the intended > setup is that individual connection handles are not concurrently > accessed by multiple threads. Is it possible something like this is > going on? Note that what I am talking about here is two different > threads holding references to the same connection handle - i.e., no > trips back through the pool. I just noticed another thing in [pool] that might have something to do with this. It's probably best to investigate what I have in mind on the dev list. I will post a summary / ticket reference here if it turns out I this is a bug. Phil > > Phil >> I did add the code a while back to test on create, borrow, return, and >> while idle, but it turns out that I hadn't actually pulled it down to >> the test server and recompiled. That is now done, so we'll see if that >> makes any difference. >> >> If testing the connection on pool actions does make a difference, then >> what is your speculation about what was happening when I ran into the >> closed connection only minutes after restart, and would it be worthy of >> an issue in Jira? The only theory I had was a race condition between >> eviction and borrowing, but unless there's something amiss in how all >> the object inheritance works out, it looks like that's probably not it. >> Some kind of issue with the TCP stack in Linux (either on the machines >> running my code or the MySQL server) is the only other idea I can think >> of. Or maybe a hardware/firmware issue, since it's likely that at least >> one of the NICs involved is doing TCP offload. I think that virtually >> every NIC in our infrastructure has that feature and that Linux enables it. > >> Thanks, >> Shawn >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org >> For additional commands, e-mail: user-h...@commons.apache.org >> >> - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail:
Re: [DBCP] Connection just obtained from datasource is invalid
On 1/8/18 4:23 PM, Shawn Heisey wrote: > On 11/22/2017 5:00 PM, Phil Steitz wrote: >> If the problem is the evictor closing a connection and having that >> connection delivered to a client, the problem is almost certainly in >> pool. The thread-safety of the pool in this regard is engineered in >> DefaultPooledObject, which is the wrapper that pool manages and >> delivers to DBCP. When the evictor visits a PooledObject (in >> GenericObjectPool#evict) it tries to start the eviction test on the >> object by calling its startEvictionTest method. This method is >> synchronized on the DefaultPooledObject. Look at the code in that >> method. It checks to make sure that the object is in fact idle in >> the pool. The other half of the protection here is in >> GenericObjectPool#borrowObject, which is what PoolingDataSource >> calls to get a connection. That method tries to get a PooledObject >> from the pool and before handing it out (or validating it), it calls >> the PooledObject's allocate method. Look at the code for that in >> DefaultPooledObject. That method (also synchronized on the >> PooledObject) checks that the object is not under eviction and sets >> its state to allocated. That is the core sync protection that >> *should* make it impossible for the evictor to do anything to an >> object that has been handed out to a client. > I see the synchronization you're talking about here. It appears that > all of the critical methods in DefaultPooledObject are synchronized (on > the object). > > If you're absolutely certain that DefaultPooledObject is involved with > all of the implementation my code is using, it all looks pretty complete > to me. Yes, the code you posted at the top of the thread uses a PoolableConnectionFactory as the object factory for the pool. You can see that PCF's makeObject returns a DefaultPooledObject, so that much is certain. > So I'm really curious as to why the connection is getting > closed. I have seen the problem only minutes after restarting my > program, so it seems unlikely that the server side is closing the > connection, since the timeout for that is 8 hours. I looked back at the initial stack trace and I noticed something that I had not noticed before. This line org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262) means that checkOpen() succeeded. That, combined with your statement above that isClosed() returns true on a failed connection means that there might be concurrent access to the DelegatingConnections happening. It looks like the sequence might have been: thread 1: checkOpen - sees true thread 2: close the DelegatingConnection (there is no sync to prevent this) thread1 : createStatement - bang! thread1 : isClosed() returns true DBCP is not really safe to use that way - i.e., really the intended setup is that individual connection handles are not concurrently accessed by multiple threads. Is it possible something like this is going on? Note that what I am talking about here is two different threads holding references to the same connection handle - i.e., no trips back through the pool. Phil > > I did add the code a while back to test on create, borrow, return, and > while idle, but it turns out that I hadn't actually pulled it down to > the test server and recompiled. That is now done, so we'll see if that > makes any difference. > > If testing the connection on pool actions does make a difference, then > what is your speculation about what was happening when I ran into the > closed connection only minutes after restart, and would it be worthy of > an issue in Jira? The only theory I had was a race condition between > eviction and borrowing, but unless there's something amiss in how all > the object inheritance works out, it looks like that's probably not it. > Some kind of issue with the TCP stack in Linux (either on the machines > running my code or the MySQL server) is the only other idea I can think > of. Or maybe a hardware/firmware issue, since it's likely that at least > one of the NICs involved is doing TCP offload. I think that virtually > every NIC in our infrastructure has that feature and that Linux enables it. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Connection just obtained from datasource is invalid
On 1/9/2018 8:31 AM, Gary Gregory wrote: Currious: are you using the current versions of DBCP and Pool? The last compile pulled down pool 2.5.0, dbcp 2.2.0, and mysql driver 5.1.45 (using ivy). I haven't seen the problem since that compile (which also enabled testing connections on various pool actions like create, borrow, etc), but it can take a while before it occurs, so I'm not calling it solved yet. When I started this thread on November 15th, I was running pool 2.4.3, dbcp 2.1.1, and whatever version of the mysql driver was current at the time. Those were the latest pool and dbcp versions at that time. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Connection just obtained from datasource is invalid
And of your JDBC driver? Gary On Jan 9, 2018 8:31 AM, "Gary Gregory"wrote: > Currious: are you using the current versions of DBCP and Pool? > > Gary > > On Jan 9, 2018 8:28 AM, "Shawn Heisey" wrote: > >> On 1/8/2018 11:19 PM, Chas Honton wrote: >> > Don’t forget about any proxies or routers between the client and the >> server. Most infrastructure components will close quiescent connections >> (within minutes) >> >> There are no proxies for the MySQL traffic. The machines involved are >> on the same subnet (vlan), so it's only layer 2 switches made by Cisco >> that provide the networking -- no routing. There is no QoS configured. >> The connections are gigabit and the switchports are not incrementing >> error counters. The inter-switch connections are trunk ports and also >> show no incrementing error counters. >> >> Cisco switches tend to have enough backplane bandwidth to run every port >> at full line rate in both directions. We don't run our infrastructure >> anywhere near full switching capacity, so dropped packets are unlikely, >> but even if that did happen, TCP is designed to handle that >> transparently to applications. >> >> Cisco routers and switches do not typically interfere in TCP >> communication. Their firewalls do get involved in TCP, but even if this >> traffic was passing through our firewall (which it's not), our TCP idle >> timeout on the firewall is set to two hours, to accommodate long-lived >> TCP connections over the Internet. >> >> Thanks, >> Shawn >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org >> For additional commands, e-mail: user-h...@commons.apache.org >> >>
Re: [DBCP] Connection just obtained from datasource is invalid
Currious: are you using the current versions of DBCP and Pool? Gary On Jan 9, 2018 8:28 AM, "Shawn Heisey"wrote: > On 1/8/2018 11:19 PM, Chas Honton wrote: > > Don’t forget about any proxies or routers between the client and the > server. Most infrastructure components will close quiescent connections > (within minutes) > > There are no proxies for the MySQL traffic. The machines involved are > on the same subnet (vlan), so it's only layer 2 switches made by Cisco > that provide the networking -- no routing. There is no QoS configured. > The connections are gigabit and the switchports are not incrementing > error counters. The inter-switch connections are trunk ports and also > show no incrementing error counters. > > Cisco switches tend to have enough backplane bandwidth to run every port > at full line rate in both directions. We don't run our infrastructure > anywhere near full switching capacity, so dropped packets are unlikely, > but even if that did happen, TCP is designed to handle that > transparently to applications. > > Cisco routers and switches do not typically interfere in TCP > communication. Their firewalls do get involved in TCP, but even if this > traffic was passing through our firewall (which it's not), our TCP idle > timeout on the firewall is set to two hours, to accommodate long-lived > TCP connections over the Internet. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > >
Re: [DBCP] Connection just obtained from datasource is invalid
On 1/8/2018 11:19 PM, Chas Honton wrote: > Don’t forget about any proxies or routers between the client and the server. > Most infrastructure components will close quiescent connections (within > minutes) There are no proxies for the MySQL traffic. The machines involved are on the same subnet (vlan), so it's only layer 2 switches made by Cisco that provide the networking -- no routing. There is no QoS configured. The connections are gigabit and the switchports are not incrementing error counters. The inter-switch connections are trunk ports and also show no incrementing error counters. Cisco switches tend to have enough backplane bandwidth to run every port at full line rate in both directions. We don't run our infrastructure anywhere near full switching capacity, so dropped packets are unlikely, but even if that did happen, TCP is designed to handle that transparently to applications. Cisco routers and switches do not typically interfere in TCP communication. Their firewalls do get involved in TCP, but even if this traffic was passing through our firewall (which it's not), our TCP idle timeout on the firewall is set to two hours, to accommodate long-lived TCP connections over the Internet. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org