> On Mar 23, 2021, at 10:45 AM, Emmanuel Lécharny <elecha...@gmail.com> wrote:
>
> On 23/03/2021 14:24, Shawn McKinney wrote:
>>> On Mar 23, 2021, at 3:00 AM, Emmanuel Lécharny <elecha...@gmail.com> wrote:
>>>
>>>
>>> On 22/03/2021 19:41, Shawn McKinney wrote:
>>>>> On Mar 22, 2021, at 11:05 AM, Emmanuel Lécharny <elecha...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> LDAP connections are quite stable. Again, this check is there to respect
>>>>> the commons-pool API contract. If the connection is dead, then doing this
>>>>> check will let the pool fetching another connection, which is good. OTOH,
>>>>> if you don't want to check connections while fetching one, then you are
>>>>> on your own (ie, deal with the consequences of a bad connection).
>>>> Again, I must disagree. Connections aren’t stable, subjected to env
>>>> conditions and we can never assume a connection is good.
>>>
>>> You *can* assume it's good, and react exceptionally if it's not. That the
>>> fastest way to deal with potential bad connections, if you want to avoid a
>>> check every time you pull a connection from the pool. (but see later for
>>> another option)
>> There are 150 locations in fortress core where an ldap connection is being
>> pulled from the pool. No way I want to revisit all of that code.
>>>
>>> Something in the API chain must do the job of testing and reconnect, every
>>> time it’s pulled from the pool.
>>>
>>> This is exactly what the testOnBorrow does ;-) But it's costly... (see
>>> later for another option)
>>>
>>>> Now, having said that, that’s exactly what I’m observing currently, with
>>>> the test on borrow flag turned off.
>>>> Let me explain the scenario:
>>>> 1. Start server
>>>> 2. Start client
>>>> This initializes a connection pool which creates and stores exactly 1
>>>> connection (min 1, max1 )
>>>> 3. Set breakpoint in client on pool.getConnection method
>>>> 4. Restart the server.
>>>> 5. Client performs ldap op which triggers the breakpoint on getConnection
>>>> 6. Server at this point still hasn’t any connections with the client. The
>>>> client ‘thinks’ it has connections in the pool, but these were broken on
>>>> step 4.
>>>> 7. Step over the getConnection which then triggers the commons pool to
>>>> execute:
>>>> ```
>>>> GenericObjectPool._factory.activateObject(latch.getPair().value)
>>>> ```
>>>> 8. A connection is made with the server, along with bind
>>>> ```
>>>> 6058e163 conn=1000 fd=14 ACCEPT from IP=127.0.0.1:35246 (IP=127.0.0.1:389)
>>>> [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com"
>>>> method=128
>>>> [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com"
>>>> mech=SIMPLE ssf=0
>>>> [exec] 6058e163 conn=1000 op=0 RESULT tag=97 err=0 duration=1.667ms
>>>> ```
>>>> 9. Client continues with its ldap op successfully and is never the wiser
>>>> that the connections were all forcibly closed on server restart.
>>>> This is EXACTLY what I want to see all of which is done without the test
>>>> on borrow eliminating the extraneous search on every connection retrieval.
>>>
>>> So that would mean we have some kind of 'retry' on connection operation: if
>>> it fails, then let's assume the connection is broken, and redo it with a
>>> fresh connection.
>>>
>> Yes, that’s what’s happening.
>> In the Commons pool lib, this is the block that gets executed:
>> ``` this._factory.activateObject(latch.getPair().value);
>> if (this._testOnBorrow &&
>> !this._factory.validateObject(latch.getPair().value)) {
>> throw new Exception("ValidateObject failed");
>> }
>> ```
>> In the first line above, activateObject, this code gets called, from our
>> AbstractPoolableLdapConnectionFactory class:
>> ```
>> public void activateObject(LdapConnection connection) throws LdapException {
>> LOG.debug("Activating {}", connection);
>> if (!connection.isConnected() ||
>> !connection.isAuthenticated()) {
>> LOG.debug("rebind due to connection dropped on {}", connection);
>> this.connectionFactory.bindConnection(connection);
>> }
>> ```
>> The code performs a rebind which renews the connection.
>> All of this with testOnBorrow being false!
>
>
> The second bind is pretty brutal, as it will create a brand new connection.
>
>> So, I’m still scratching my head figuring why we need this secondary level
>> that is wasting a round trip to the server.
>
> Well oh well, not sure we need it at all, considering the piece of code you
> exhumated ;-)
>
Yes, that’s pretty much where I’m at. I’ve changed fortress to turn it off by
default. Also be good to think about changing the default behavior in teh API
as well.
>
>>> The problem is that the pool is passive: it does not react to any
>>> connection event, like a connection closure. OTOH, when a connection is
>>> properly closed by the server (ie an NoticeOfDisconnect - aka NoD - is
>>> generated by the server), the connection should process it and die. Now we
>>> have an issue: the connection is just lying in the pool, not being used by
>>> any client, so there is a missing step: removing the connection from the
>>> pool in this very case. That can be something we can add to the current
>>> LDAP pool.
>>>
>>> Note that if the server does not send a NoD, you are screwed. There is no
>>> way to be sure that the connection is ok until you use it. OTOH, leveraging
>>> the setTestWhileIdle() could be a solution to partially workaround the
>>> issue.
>> Here you’ve lost me. I’m running the server in debug mode, inside a bash
>> shell, running in the foreground.
>> In my test I stop the server by ‘ctrl-c’ killing the shell.
>> The server is not reacting to this and sending a NoD.
>
> Indeed, the worst case scenario. It's pretty much as if you simply pull out
> the ethernet cable from the socket (oh, well, put your wifi box in a Faraday
> cage).
>
Exactly! A pretty good test.
> Yes, in this very case, there will be no NoD. TCP is a best effort protocol,
> the only way to be sure that there is 'somebody' on the other side is to send
> some data to it and expect some kind of response, in absence of which you
> would consider the connection dead in the water. Add to that the existence of
> a timeout mechanism on the OS level (generally speaking set to 7200 s), which
> makes it critical to set up a keepalive system to ensure that the connection
> never dies because of a timeout...
>
> In the LDAP API, we could prevent - sort of - the connection from being
> tested on demand (ie when you need it), by setting a check on idling (if the
> connection has idled for more than a given period of time, then we would
> consider it's dead). It's not leverage in the API atm.
>
I’m actually thinking about turning off teh check on idle. It’s not needed,
right?
>
> All that being said, with the current 'rebind' mechanism, you should be safe,
> I think.
Thanks Emmanuel. This is where I was hoping we’d end up with the discussion.
It’s a pretty big decision to make in a vacuum of a test env and wanted to get
some more eyeballs on it.
I’ll point it out specifically on the next release, so it’s not overlooked.
—
Shawn
---------------------------------------------------------------------
To unsubscribe, e-mail: api-unsubscr...@directory.apache.org
For additional commands, e-mail: api-h...@directory.apache.org