Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-21 Thread Phil Steitz
On 11/21/17 10:58 AM, Shawn Heisey wrote:
> On 11/17/2017 3:27 PM, Phil Steitz wrote:
>> OK, sorry I misread your initial post.  That means either the pool
>> agrees that the connection is closed (which would point to a pool or
>> DBCP bug) or the driver's isValid() on the underlying physical
>> connection is returning true (without throwing).  If it's not too
>> hard to do, it would be good to also log what comes back from
>> isClosed() on the DelegatingConnection (what dsMaster returns).  If
>> that returns true, then either this is a DBCP or pool bug or you
>> have somehow closed the DelegatingConnection via another reference
>> or something.  If it returns false, that means the pool thinks the
>> connection is open so it is unlikely that the pool closed it.
> I'm pretty sure (will have to check) that when my code uses the
> connections it gets from the pool, it does do a close on it when
> complete, but it's my understanding that this is how to return
> connections to the pool, that it doesn't *truly* close the connection as
> it would if I were using JDBC directly instead of via DBCP.

That is correct.
>
> I have changed my program to check isClosed instead of making the
> "SELECT 1" query on the invalid connection, and to log something if that
> returns true.  So far the problem hasn't happened again, even though I
> did set the eviction interval back to exactly one minute.

That will tell if something funny is going on.  Having isClosed
return true on a connection that you have just gotten from the pool
(and no other thread has messed with) would be DBCP/Pool badness.
>
>> What you have below is fine.  I just wanted to see the pool
>> configuration settings.  Looks OK, but might end up creating a lot
>> of connection churn having maxIdle set to 6 with maxTotal at 30.  I
>> guess the idea is that when it runs, you want to have an idle
>> connection available for each shard with the ability to add more up
>> to a total of 30, but you don't want those idle connections to stay
>> in the pool when the work isn't being done.  Do you know how many
>> connections actually get opened during the work period?  If that is
>> more than a few more than 6, you are kind of defeating the purpose
>> of the pool by closing them all when they get returned (as the
>> maxIdle setting will force).
> Here's a screenshot showing the open connections on the DB server at a
> particular moment.  The connections in this case are coming from the
> server named "idxa1".  There are two copies of the program running --
> one where numShards is 2, and one where numShards is 6.  As you can see
> in the screenshot, there are twelve connections from idxa1.
>
> https://www.dropbox.com/s/g9swe4hrq19jbcc/mysql-dbcp-connections-workbench.png?dl=0
>
> As the program does its work, it uses one connection to the main server
> to gather some information and then it can open up numShards+1
> connections to the main server to actually handle updating the Solr
> index from the database.  (numShards refers to the number of cold
> shards, but there is one more shard in each index -- the hot shard).  On
> each cycle, position information is written to the master server.
>
> Because we had to do some work on a DB server a while back, the main
> pool and the master pool are both set to the same host -- the master
> server.  This is slightly unusual, but that configuration pre-dates the
> recent problems with closed connections by quite a bit of time. 
> Eventually I will be switching it back so the main server is one of the
> slaves, but I would like to figure this problem out first.
>
>> The fact that changing the eviction interval to not exactly coincide
>> with when the work happens makes the problem go away is troubling. 
>> I can see how that might improve performance, but it does make the
>> pool bug scenario more likely.
> Do you think I should be doing things differently than I currently do? 
> If so, please feel free to make recommendations.  I'm not really sure
> what your statement about performance is saying.  I'm not doing eviction
> for performance reasons -- in fact, it might actually *reduce*
> performance, because it's more likely that "getConnection" is going to
> have to actually spin up a new TCP connection to the database.  The time
> required for that doesn't worry me.

The performance comment was just to point out that if you run the
evictor exactly at the time when the work happens, there is some
contention between the evictor and the borrowing threads.  But this
has really been optimized in pool2, so you probably can't measure
it.  So basically forget that comment.  The fact that the problem
goes away when you eliminate the contention remains troubling from
DBCP/Pool standpoint.
>
>> One more question on the config.  Why are you running the evictor
>> every minute and using eviction to close idle connections?  The
>> maxIdle setting will keep the connection count down.  If what you
>> are worried about is server-side idle 

Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-21 Thread Phil Steitz
On 11/21/17 11:34 AM, Shawn Heisey wrote:
> On 11/16/2017 5:21 PM, Phil Steitz wrote:
>> That should not be possible, as once a connection has been checked
>> out, it is not eligible for eviction.  And even if it were evicted
>> by the pool, the pool would set the DelegatingConnection's closed
>> property to true so isValid would return false rather than
>> throwing.  It looks like the situation here is that the pool thinks
>> the connection is open, but the underlying physical connection has
>> been closed.
> I am very glad to hear you describe the intended operation this way --
> the gist of which seems to say that when getConnection is called, it
> *should* immediately mark that connection as ineligible for eviction.
>
> My current suspicion is that there is a bug, where getConnection isn't
> marking the connection as ineligible like it should, or maybe where
> there is insufficient synchronization to prevent the eviction thread
> from closing/evicting the connection before getConnection *does* mark
> the connection as ineligible.  This is just a theory, but it does fit
> what I've seen.  I have tried to investigate the code, but the project
> has used object inheritance and abstraction very effectively, which
> makes it difficult for me to follow exactly what the code is doing,
> because I am not familiar with all of that inheritance.

It's always possible that there is a bug. The relevant code is in
commons pool.  The code to look at is in
o.a.c.pool2.impl.GenericObjectPool#evict.  The guard is here:

if (!underTest.startEvictionTest()) {
    // Object was borrowed in another thread
    // Don't count this as an eviction test so reduce i;
    i--;
    continue;
 }

The objects that the pool works with are PooledObjects.  DBCP's
PoolableConnectionFactory creates DefaultPooledObjects, which
implement the method above by returning false unless the object is
idle in the pool.  If it is idle in the pool, its state is changed
to indicate that it is under test.  This method is synchronized (on
the PooledObject's monitor), as are the other methods that read or
mutate its PooledObjectState.

Before returning an object from the pool to a client,
GenericObjectPool's borrowObject calls the PooledObject's allocate
method.  That method, again synchronized on the object, checks to
make sure that the object's state is idle in the pool.  If so, it
changes its state to allocated and before returning it to the client.

As I said above, it's always possible that there is a bug somewhere,
but the setup is simple and it has been hammered pretty hard for a
few years now.
>
> I am not aware of any way for one thread in my code to close a
> connection obtained by another thread.  My design intent is to ensure
> that this is completely impossible, by handling the full lifecycle of a
> connection in one thread and never passing connections between threads. 
> I am not aware of any places in the program that violate this design
> intent ... but I've written plenty of bugs in my time, so it's always
> possible that I *have* made a mistake like that.

As I said in my first response, the most common explanation for this
kind of exception when using DBCP is that the underlying physical
connection is closed on the server (or network) side without DBCP
knowing about it.  And the most common cause of that is server-side
idle connection timeout.

It just occurred to me that since you do not have any of the testXxx
flags set to true, DBCP is never actually exercising the
connections.  Is it possible that sometimes your code checks out a
connection from the pool but does not use it?  "Idle" to DBCP/Pool
means idle in the pool - i.e., not checked out to a client.  So if
connections are checked out and checked back in without being used,
they will not be evicted, but could be timed out by the server.

Phil
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-21 Thread Shawn Heisey
On 11/16/2017 5:21 PM, Phil Steitz wrote:
> That should not be possible, as once a connection has been checked
> out, it is not eligible for eviction.  And even if it were evicted
> by the pool, the pool would set the DelegatingConnection's closed
> property to true so isValid would return false rather than
> throwing.  It looks like the situation here is that the pool thinks
> the connection is open, but the underlying physical connection has
> been closed.

I am very glad to hear you describe the intended operation this way --
the gist of which seems to say that when getConnection is called, it
*should* immediately mark that connection as ineligible for eviction.

My current suspicion is that there is a bug, where getConnection isn't
marking the connection as ineligible like it should, or maybe where
there is insufficient synchronization to prevent the eviction thread
from closing/evicting the connection before getConnection *does* mark
the connection as ineligible.  This is just a theory, but it does fit
what I've seen.  I have tried to investigate the code, but the project
has used object inheritance and abstraction very effectively, which
makes it difficult for me to follow exactly what the code is doing,
because I am not familiar with all of that inheritance.

I am not aware of any way for one thread in my code to close a
connection obtained by another thread.  My design intent is to ensure
that this is completely impossible, by handling the full lifecycle of a
connection in one thread and never passing connections between threads. 
I am not aware of any places in the program that violate this design
intent ... but I've written plenty of bugs in my time, so it's always
possible that I *have* made a mistake like that.

Thanks,
Shawn


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: [DBCP] Connection just obtained from datasource is invalid

2017-11-21 Thread Shawn Heisey
On 11/17/2017 3:27 PM, Phil Steitz wrote:
> OK, sorry I misread your initial post.  That means either the pool
> agrees that the connection is closed (which would point to a pool or
> DBCP bug) or the driver's isValid() on the underlying physical
> connection is returning true (without throwing).  If it's not too
> hard to do, it would be good to also log what comes back from
> isClosed() on the DelegatingConnection (what dsMaster returns).  If
> that returns true, then either this is a DBCP or pool bug or you
> have somehow closed the DelegatingConnection via another reference
> or something.  If it returns false, that means the pool thinks the
> connection is open so it is unlikely that the pool closed it.

I'm pretty sure (will have to check) that when my code uses the
connections it gets from the pool, it does do a close on it when
complete, but it's my understanding that this is how to return
connections to the pool, that it doesn't *truly* close the connection as
it would if I were using JDBC directly instead of via DBCP.

I have changed my program to check isClosed instead of making the
"SELECT 1" query on the invalid connection, and to log something if that
returns true.  So far the problem hasn't happened again, even though I
did set the eviction interval back to exactly one minute.

> What you have below is fine.  I just wanted to see the pool
> configuration settings.  Looks OK, but might end up creating a lot
> of connection churn having maxIdle set to 6 with maxTotal at 30.  I
> guess the idea is that when it runs, you want to have an idle
> connection available for each shard with the ability to add more up
> to a total of 30, but you don't want those idle connections to stay
> in the pool when the work isn't being done.  Do you know how many
> connections actually get opened during the work period?  If that is
> more than a few more than 6, you are kind of defeating the purpose
> of the pool by closing them all when they get returned (as the
> maxIdle setting will force).

Here's a screenshot showing the open connections on the DB server at a
particular moment.  The connections in this case are coming from the
server named "idxa1".  There are two copies of the program running --
one where numShards is 2, and one where numShards is 6.  As you can see
in the screenshot, there are twelve connections from idxa1.

https://www.dropbox.com/s/g9swe4hrq19jbcc/mysql-dbcp-connections-workbench.png?dl=0

As the program does its work, it uses one connection to the main server
to gather some information and then it can open up numShards+1
connections to the main server to actually handle updating the Solr
index from the database.  (numShards refers to the number of cold
shards, but there is one more shard in each index -- the hot shard).  On
each cycle, position information is written to the master server.

Because we had to do some work on a DB server a while back, the main
pool and the master pool are both set to the same host -- the master
server.  This is slightly unusual, but that configuration pre-dates the
recent problems with closed connections by quite a bit of time. 
Eventually I will be switching it back so the main server is one of the
slaves, but I would like to figure this problem out first.

> The fact that changing the eviction interval to not exactly coincide
> with when the work happens makes the problem go away is troubling. 
> I can see how that might improve performance, but it does make the
> pool bug scenario more likely.

Do you think I should be doing things differently than I currently do? 
If so, please feel free to make recommendations.  I'm not really sure
what your statement about performance is saying.  I'm not doing eviction
for performance reasons -- in fact, it might actually *reduce*
performance, because it's more likely that "getConnection" is going to
have to actually spin up a new TCP connection to the database.  The time
required for that doesn't worry me.

> One more question on the config.  Why are you running the evictor
> every minute and using eviction to close idle connections?  The
> maxIdle setting will keep the connection count down.  If what you
> are worried about is server-side idle timeouts, you could use set
> testWhileIdle to true and that would use the connections rather than
> closing them.

Because we keep having problems where we have too many connections open
on the database server, and when that happens, everything breaks.  I
want to be absolutely sure that the software I am responsible for is not
contributing to the problem.  At the moment, the massive numbers of
connections are being kept open by our webapps, code that I am not
responsible for.  The idle connection eviction in my programs ensures
that if my software has a connection open, that it's because it has
actually *needed* that connection in the last five minutes.  Our
application developers need to implement something similar in the code
running under Tomcat, so that there are not