Re: [DBCP] Apparent deadlock related to getConnection
On 4/11/2015 8:48 AM, Phil Steitz wrote: What is going on in that thread is that it is blocked waiting for a connection to become available in the pool. There is no evidence of deadlock there - just pool exhaustion. The most likely culprit is application code that is failing to close connections that have been borrowed from the pool. If you don't close the connections on all execution paths, they will leak as the pool has no way of knowing that your code is finished with them unless and until you close them. I've done a review of all my database code for leaks in the past, and everything looked good at that time. I can repeat that check. The database portions of my code have not had any changes for quite some time. Would I see evidence of a connection leak in the show full processlist output at the MySQL server? I don't see that happening. There are only a handful of open connections when I check there, even when the program has been running for several days or weeks, running through its cycle once a minute. Once all parts of the program have been run at least once, the connection count is stable. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
I once had a library that allocated 2 connections to complete a transaction. Every now and then we'd deadlock the pool if number of concurrent transactions pool size, which could happen in load bursts. If you do a thread dump of your process and see all threads are waiting for a connection this is most probably happening. The non-solution solution is to increase pool size, the real solution was never to allocate 2 connections to complete a transaction. Kristian 2015-04-12 11:58 GMT+02:00 Shawn Heisey apa...@elyograg.org: On 4/11/2015 8:48 AM, Phil Steitz wrote: What is going on in that thread is that it is blocked waiting for a connection to become available in the pool. There is no evidence of deadlock there - just pool exhaustion. The most likely culprit is application code that is failing to close connections that have been borrowed from the pool. If you don't close the connections on all execution paths, they will leak as the pool has no way of knowing that your code is finished with them unless and until you close them. I've done a review of all my database code for leaks in the past, and everything looked good at that time. I can repeat that check. The database portions of my code have not had any changes for quite some time. Would I see evidence of a connection leak in the show full processlist output at the MySQL server? I don't see that happening. There are only a handful of open connections when I check there, even when the program has been running for several days or weeks, running through its cycle once a minute. Once all parts of the program have been run at least once, the connection count is stable. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
On 4/12/2015 9:16 AM, Phil Steitz wrote: Can you post your full pool config and info on versions of DBCP and POOL? Do you call isClosed() before closing a connection (not recommended - that is really the issue in the ticket you commented)? To make sure there is not an actual deadlock somewhere, you should examine a full thread dump. The thread you posted in the ticket is waiting on the pool, not holding any DBCP or POOL related locks; but it does hold this application lock: at com.REDACTED.idxbuild.solr.Chain.doReinsert(Chain.java:1332) - locked 0xdb0011c8 (a java.lang.Object) Make sure no other threads in process of doing things that will return connections to the pool aren't waiting on this lock. Given that you are using DBCP 2.1, if you have access to a JMX console it would also be good to look at what is happening in the pool when your application hangs. In particular, numActive, numIdle and listAllObjects would be good to look at. Looks like you are using PoolingDataSource directly, so you need to look at the underlying GenericObject's JMX properties. My code does not contain the string isClosed at all. I thought I had saved the full thread dump, but the stdout logfile was wiped when I restarted the program, and I didn't think to save a copy. I will be sure to save it next time there's a problem. Here is the entire java file for my Database class, edited to redact the company top-level domain: http://apaste.info/6pX The object that I am locking on (0xdb0011c8) is a global lock for my whole program which I synchronize on for a handful of normally-short-lived db-related operations just to be sure that they aren't happening simultaneously, and to help make sure that the outcome of certain operations is visible across threads. I'm reasonably sure that there are no operations in those synchronization blocks that are expected to be slow, though I haven't verified that to 100% certainty. One of the operations handled in that kind of synchronization lock is the optimization of certain DB tables, which only happens if the size of those tables is below a certain threshold -- part of making sure that the operation completes quickly and doesn't deadlock the program. Here are all the jar files in my program's lib directory. My program uses javamail, jrobin, dbcp, and solrj: commons-dbcp2-2.1.jar commons-io-2.4.jar commons-pool2-2.3.jar httpclient-4.4.1.jar httpcore-4.4.1.jar httpmime-4.4.1.jar javax.mail.jar jcl-over-slf4j-1.7.12.jar jrobin-1.5.14.jar jul-to-slf4j-1.7.12.jar log4j-1.2.17.jar mysql-connector-java-5.1.35-bin.jar noggit-0.7.jar slf4j-api-1.7.12.jar slf4j-log4j12-1.7.12.jar solr-solrj-5.0.0.jar stax2-api-3.1.1.jar woodstox-core-asl-4.2.0.jar zookeeper-3.4.6.jar I do not have remote JMX enabled for this application, and the server where it runs does not have a gui. I will add some code that spits out the three requested pool stats every time the watchdog thread detects a problem. I cannot predict when there will be a problem ... this code can run for weeks with nothing going wrong, or the deadlock might happen only a few minutes after startup. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
On 4/12/2015 5:18 AM, Kristian Rosenvold wrote: I once had a library that allocated 2 connections to complete a transaction. Every now and then we'd deadlock the pool if number of concurrent transactions pool size, which could happen in load bursts. If you do a thread dump of your process and see all threads are waiting for a connection this is most probably happening. The non-solution solution is to increase pool size, the real solution was never to allocate 2 connections to complete a transaction. It will take a code review to be sure, but I'm pretty sure I would remember if there was anything odd like that in the code. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
On 4/12/15 2:58 AM, Shawn Heisey wrote: On 4/11/2015 8:48 AM, Phil Steitz wrote: What is going on in that thread is that it is blocked waiting for a connection to become available in the pool. There is no evidence of deadlock there - just pool exhaustion. The most likely culprit is application code that is failing to close connections that have been borrowed from the pool. If you don't close the connections on all execution paths, they will leak as the pool has no way of knowing that your code is finished with them unless and until you close them. I've done a review of all my database code for leaks in the past, and everything looked good at that time. I can repeat that check. The database portions of my code have not had any changes for quite some time. Also try turning on removeAbandoned and logAbandoned. Would I see evidence of a connection leak in the show full processlist output at the MySQL server? I don't see that happening. There are only a handful of open connections when I check there, even when the program has been running for several days or weeks, running through its cycle once a minute. Once all parts of the program have been run at least once, the connection count is stable. Can you post your full pool config and info on versions of DBCP and POOL? Do you call isClosed() before closing a connection (not recommended - that is really the issue in the ticket you commented)? To make sure there is not an actual deadlock somewhere, you should examine a full thread dump. The thread you posted in the ticket is waiting on the pool, not holding any DBCP or POOL related locks; but it does hold this application lock: at com.REDACTED.idxbuild.solr.Chain.doReinsert(Chain.java:1332) - locked 0xdb0011c8 (a java.lang.Object) Make sure no other threads in process of doing things that will return connections to the pool aren't waiting on this lock. Given that you are using DBCP 2.1, if you have access to a JMX console it would also be good to look at what is happening in the pool when your application hangs. In particular, numActive, numIdle and listAllObjects would be good to look at. Looks like you are using PoolingDataSource directly, so you need to look at the underlying GenericObject's JMX properties. Phil Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
On 4/12/15 10:31 AM, Shawn Heisey wrote: On 4/12/2015 9:16 AM, Phil Steitz wrote: Can you post your full pool config and info on versions of DBCP and POOL? Do you call isClosed() before closing a connection (not recommended - that is really the issue in the ticket you commented)? To make sure there is not an actual deadlock somewhere, you should examine a full thread dump. The thread you posted in the ticket is waiting on the pool, not holding any DBCP or POOL related locks; but it does hold this application lock: at com.REDACTED.idxbuild.solr.Chain.doReinsert(Chain.java:1332) - locked 0xdb0011c8 (a java.lang.Object) Make sure no other threads in process of doing things that will return connections to the pool aren't waiting on this lock. Given that you are using DBCP 2.1, if you have access to a JMX console it would also be good to look at what is happening in the pool when your application hangs. In particular, numActive, numIdle and listAllObjects would be good to look at. Looks like you are using PoolingDataSource directly, so you need to look at the underlying GenericObject's JMX properties. My code does not contain the string isClosed at all. I thought I had saved the full thread dump, but the stdout logfile was wiped when I restarted the program, and I didn't think to save a copy. I will be sure to save it next time there's a problem. Full thread dump will be helpful. Here is the entire java file for my Database class, edited to redact the company top-level domain: http://apaste.info/6pX Several methods, getSingleObject, getResultSet, closeResultSet, doUpdate can abandon connections on exception paths. You should insert finally blocks in these methods to ensure that connections are closed. Have a look at the PoolingDataSource example referenced [1] on the DBCP web page for a model to use with try-catch-finally. Not closing connections on exception paths will cause leaks if exceptions are thrown. If you are seeing BuildExceptions generated by this code, each of them is leaking a connection. The object that I am locking on (0xdb0011c8) is a global lock for my whole program which I synchronize on for a handful of normally-short-lived db-related operations just to be sure that they aren't happening simultaneously, and to help make sure that the outcome of certain operations is visible across threads. I'm reasonably sure that there are no operations in those synchronization blocks that are expected to be slow, though I haven't verified that to 100% certainty. They don't have to be slow to deadlock, if you hit the right timing and the following kind of scenario: Thread 1 grab the lock and hold it while waiting to get a connection (what the thread you showed in the ticket is doing) Threads 2, ..., n try to get the lock before returning connections they hold Key is to get a full thread dump and see which threads are waiting on / holding which locks. Phil [1] http://s.apache.org/0Nl One of the operations handled in that kind of synchronization lock is the optimization of certain DB tables, which only happens if the size of those tables is below a certain threshold -- part of making sure that the operation completes quickly and doesn't deadlock the program. Here are all the jar files in my program's lib directory. My program uses javamail, jrobin, dbcp, and solrj: commons-dbcp2-2.1.jar commons-io-2.4.jar commons-pool2-2.3.jar httpclient-4.4.1.jar httpcore-4.4.1.jar httpmime-4.4.1.jar javax.mail.jar jcl-over-slf4j-1.7.12.jar jrobin-1.5.14.jar jul-to-slf4j-1.7.12.jar log4j-1.2.17.jar mysql-connector-java-5.1.35-bin.jar noggit-0.7.jar slf4j-api-1.7.12.jar slf4j-log4j12-1.7.12.jar solr-solrj-5.0.0.jar stax2-api-3.1.1.jar woodstox-core-asl-4.2.0.jar zookeeper-3.4.6.jar I do not have remote JMX enabled for this application, and the server where it runs does not have a gui. I will add some code that spits out the three requested pool stats every time the watchdog thread detects a problem. I cannot predict when there will be a problem ... this code can run for weeks with nothing going wrong, or the deadlock might happen only a few minutes after startup. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [DBCP] Apparent deadlock related to getConnection
On 4/12/2015 12:43 PM, Phil Steitz wrote: On 4/12/15 10:31 AM, Shawn Heisey wrote: http://apaste.info/6pX Several methods, getSingleObject, getResultSet, closeResultSet, doUpdate can abandon connections on exception paths. You should insert finally blocks in these methods to ensure that connections are closed. I have modified the code to put close operations in finally blocks in getSingleObject, doUpdate, and getColumnSet, and I have done the same thing on all uses of the other two methods, with the close method called in a finally block. The updated version of the Database class is here: http://apaste.info/6XU I've engaged in some renaming. The getResultSet method is now called createConnectionAndResultSet, and the closeResultSet method is now closeAllFromResultSet. They don't have to be slow to deadlock, if you hit the right timing and the following kind of scenario: Thread 1 grab the lock and hold it while waiting to get a connection (what the thread you showed in the ticket is doing) Threads 2, ..., n try to get the lock before returning connections they hold Key is to get a full thread dump and see which threads are waiting on / holding which locks. If the finally changes I've made do not fix the problem, I will gather a new full thread dump for analysis. You've given me an idea of some additional bad coding that I can look for, so thank you. I've tried to make all synchronized code use the database in a self-contained manner, but I might have made mistakes. Thanks, Shawn - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org