> On Mar 23, 2021, at 10:45 AM, Emmanuel Lécharny <elecha...@gmail.com> wrote:
> 
> On 23/03/2021 14:24, Shawn McKinney wrote:
>>> On Mar 23, 2021, at 3:00 AM, Emmanuel Lécharny <elecha...@gmail.com> wrote:
>>> 
>>> 
>>> On 22/03/2021 19:41, Shawn McKinney wrote:
>>>>> On Mar 22, 2021, at 11:05 AM, Emmanuel Lécharny <elecha...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> LDAP connections are quite stable. Again, this check is there to respect 
>>>>> the commons-pool API contract. If the connection is dead, then doing this 
>>>>> check will let the pool fetching another connection, which is good. OTOH, 
>>>>> if you don't want to check connections while fetching one, then you are 
>>>>> on your own (ie, deal with the consequences of a bad connection).
>>>> Again, I must disagree.  Connections aren’t stable, subjected to env 
>>>> conditions and we can never assume a connection is good.
>>> 
>>> You *can* assume it's good, and react exceptionally if it's not. That the 
>>> fastest way to deal with potential bad connections, if you want to avoid a 
>>> check every time you pull a connection from the pool. (but see later for 
>>> another option)
>> There are 150 locations in fortress core where an ldap connection is being 
>> pulled from the pool. No way I want to revisit all of that code.
>>> 
>>> Something in the API chain must do the job of testing and reconnect, every 
>>> time it’s pulled from the pool.
>>> 
>>> This is exactly what the testOnBorrow does ;-) But it's costly... (see 
>>> later for another option)
>>> 
>>>> Now, having said that, that’s exactly what I’m observing currently, with 
>>>> the test on borrow flag turned off.
>>>> Let me explain the scenario:
>>>> 1. Start server
>>>> 2. Start client
>>>> This initializes a connection pool which creates and stores exactly 1 
>>>> connection (min 1, max1 )
>>>> 3. Set breakpoint in client on pool.getConnection method
>>>> 4. Restart the server.
>>>> 5. Client performs ldap op which triggers the breakpoint on getConnection
>>>> 6. Server at this point still hasn’t any connections with the client.  The 
>>>> client ‘thinks’ it has connections in the pool, but these were broken on 
>>>> step 4.
>>>> 7. Step over the getConnection which then triggers the commons pool to 
>>>> execute:
>>>> ```                
>>>> GenericObjectPool._factory.activateObject(latch.getPair().value)
>>>> ```
>>>> 8. A connection is made with the server, along with bind
>>>> ```
>>>> 6058e163 conn=1000 fd=14 ACCEPT from IP=127.0.0.1:35246 (IP=127.0.0.1:389)
>>>>      [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com" 
>>>> method=128
>>>>      [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com" 
>>>> mech=SIMPLE ssf=0
>>>>      [exec] 6058e163 conn=1000 op=0 RESULT tag=97 err=0 duration=1.667ms
>>>> ```
>>>> 9. Client continues with its ldap op successfully and is never the wiser 
>>>> that the connections were all forcibly closed on server restart.
>>>> This is EXACTLY what I want to see all of which is done without the test 
>>>> on borrow eliminating the extraneous search on every connection retrieval.
>>> 
>>> So that would mean we have some kind of 'retry' on connection operation: if 
>>> it fails, then let's assume the connection is broken, and redo it with a 
>>> fresh connection.
>>> 
>> Yes, that’s what’s happening.
>> In the Commons pool lib, this is the block that gets executed:
>> ```             this._factory.activateObject(latch.getPair().value);
>> if (this._testOnBorrow &&     
>> !this._factory.validateObject(latch.getPair().value)) {
>> throw new Exception("ValidateObject failed");
>> }
>> ```
>> In the first line above, activateObject, this code gets called, from our 
>> AbstractPoolableLdapConnectionFactory class:
>> ```
>> public void activateObject(LdapConnection connection) throws LdapException {
>>   LOG.debug("Activating {}", connection);
>>   if (!connection.isConnected() ||
>>       !connection.isAuthenticated()) {
>>     LOG.debug("rebind due to connection dropped on {}", connection);
>>     this.connectionFactory.bindConnection(connection);
>> }
>> ```
>> The code performs a rebind which renews the connection.
>> All of this with testOnBorrow being false!
> 
> 
> The second bind is pretty brutal, as it will create a brand new connection.
> 
>> So, I’m still scratching my head figuring why we need this secondary level 
>> that is wasting a round trip to the server.
> 
> Well oh well, not sure we need it at all, considering the piece of code you 
> exhumated ;-)
> 

Yes, that’s pretty much where I’m at. I’ve changed fortress to turn it off by 
default.  Also be good to think about changing the default behavior in teh API 
as well.

> 
>>> The problem is that the pool is passive: it does not react to any 
>>> connection event, like a connection closure. OTOH, when a connection is 
>>> properly closed by the server (ie an NoticeOfDisconnect - aka NoD - is 
>>> generated by the server), the connection should process it and die. Now we 
>>> have an issue: the connection is just lying in the pool, not being used by 
>>> any client, so there is a missing step: removing the connection from the 
>>> pool in this very case. That can be something we can add to the current 
>>> LDAP pool.
>>> 
>>> Note that if the server does not send a NoD, you are screwed. There is no 
>>> way to be sure that the connection is ok until you use it. OTOH, leveraging 
>>> the setTestWhileIdle() could be a solution to partially workaround the 
>>> issue.
>> Here you’ve lost me.  I’m running the server in debug mode, inside a bash 
>> shell, running in the foreground.
>> In my test I stop the server by ‘ctrl-c’ killing the shell.
>> The server is not reacting to this and sending a NoD.
> 
> Indeed, the worst case scenario. It's pretty much as if you simply pull out 
> the ethernet cable from the socket (oh, well, put your wifi box in a Faraday 
> cage).
> 

Exactly!  A pretty good test.

> Yes, in this very case, there will be no NoD. TCP is a best effort protocol, 
> the only way to be sure that there is 'somebody' on the other side is to send 
> some data to it and expect some kind of response, in absence of which you 
> would consider the connection dead in the water. Add to that the existence of 
> a timeout mechanism on the OS level (generally speaking set to 7200 s), which 
> makes it critical to set up a keepalive system to ensure that the connection 
> never dies because of a timeout...
> 
> In the LDAP API, we could prevent - sort of - the connection from being 
> tested on demand (ie when you need it), by setting a check on idling (if the 
> connection has idled for more than a given period of time, then we would 
> consider it's dead). It's not leverage in the API atm.
> 

I’m actually thinking about turning off teh check on idle.  It’s not needed, 
right?


> 
> All that being said, with the current 'rebind' mechanism, you should be safe, 
> I think.

Thanks Emmanuel.  This is where I was hoping we’d end up with the discussion.  

It’s a pretty big decision to make in a vacuum of a test env and wanted to get 
some more eyeballs on it.

I’ll point it out specifically on the next release, so it’s not overlooked.

—
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: api-unsubscr...@directory.apache.org
For additional commands, e-mail: api-h...@directory.apache.org

Reply via email to