On Thu, Oct 15, 2020 at 2:31 PM Michael Osipov <1983-01...@gmx.net> wrote:
> > On Wed, Oct 14, 2020 at 6:32 PM Michael Osipov <micha...@apache.org> > wrote: > > > > > Am 2020-10-14 um 12:32 schrieb R=C3=A9my Maucherat: > > > > On Tue, Oct 13, 2020 at 8:27 PM Michael Osipov <micha...@apache.org> > > > wrote: > > > > > > > >> Am 2020-10-13 um 16:05 schrieb r...@apache.org: > > > >>> This is an automated email from the ASF dual-hosted git repository. > > > >>> > > > >>> remm pushed a commit to branch 8.5.x > > > >>> in repository https://gitbox.apache.org/repos/asf/tomcat.git > > > >>> > > > >>> > > > >>> The following commit(s) were added to refs/heads/8.5.x by this > push: > > > >>> new 883600b Always retry on a new connection, even when > pooli= > > ng > > > >>> 883600b is described below > > > >>> > > > >>> commit 883600b8a77c0be93b3a8dc2404e8d1110970ee7 > > > >>> Author: remm <r...@apache.org> > > > >>> AuthorDate: Tue Oct 13 14:19:54 2020 +0200 > > > >>> > > > >>> Always retry on a new connection, even when pooling > > > >>> > > > >>> This keeps the same very simple design as for the single > > > connection > > > >>> scenario, for now. > > > >>> --- > > > >>> java/org/apache/catalina/realm/JNDIRealm.java | 22 > > > >> +++++++++++++++++++--- > > > >>> webapps/docs/changelog.xml | 5 +++++ > > > >>> 2 files changed, 24 insertions(+), 3 deletions(-) > > > >>> > > > >>> diff --git a/java/org/apache/catalina/realm/JNDIRealm.java > > > >> b/java/org/apache/catalina/realm/JNDIRealm.java > > > >>> index 72087ab..98007f7 100644 > > > >>> --- a/java/org/apache/catalina/realm/JNDIRealm.java > > > >>> +++ b/java/org/apache/catalina/realm/JNDIRealm.java > > > >>> @@ -1311,7 +1311,7 @@ public class JNDIRealm extends RealmBase { > > > >>> close(connection); > > > >>> > > > >>> // open a new directory context. > > > >>> - connection =3D get(); > > > >>> + connection =3D get(true); > > > >>> > > > >>> // Try the authentication again. > > > >>> principal =3D authenticate(connection, username, > > > >> credentials); > > > >>> @@ -2389,12 +2389,28 @@ public class JNDIRealm extends RealmBase { > > > >>> * @exception NamingException if a directory server error > occu= > > rs > > > >>> */ > > > >>> protected JNDIConnection get() throws NamingException { > > > >>> + return get(false); > > > >>> + } > > > >>> + > > > >>> + /** > > > >>> + * Open (if necessary) and return a connection to the > configured > > > >>> + * directory server for this Realm. > > > >>> + * @param create when pooling, this forces creation of a new > > > >> connection, > > > >>> + * for example after an error > > > >>> + * @return the connection > > > >>> + * @exception NamingException if a directory server error > occurs > > > >>> + */ > > > >>> + protected JNDIConnection get(boolean create) throws > > > NamingException > > > >> { > > > >>> JNDIConnection connection =3D null; > > > >>> // Use the pool if available, otherwise use the single > > > >> connection > > > >>> if (connectionPool !=3D null) { > > > >>> - connection =3D connectionPool.pop(); > > > >>> - if (connection =3D=3D null) { > > > >>> + if (create) { > > > >>> connection =3D create(); > > > >>> + } else { > > > >>> + connection =3D connectionPool.pop(); > > > >>> + if (connection =3D=3D null) { > > > >>> + connection =3D create(); > > > >>> + } > > > >>> } > > > >>> } else { > > > >>> singleConnectionLock.lock(); > > > >> > > > >> That suitable and simple approach. > > > >> > > > > > > > > If you have the code for adding a max idle check on hand and tested, > yo= > > u > > > > should add it IMO, it will be more efficient. > > > > > > I will need to give this a couple more weeks of testing. This is what I > > > have observed today: > > > > 2020-10-14T16:01:47.039 147.54.155.198 w99sezx0... "GET > > > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 92132 > > > > > > > > 20609 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Acquiring > > > directory server connection from pool > > > > 20610 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.acquire Directory > serve= > > r > > > connection from pool exceeds max idle time, closing it > > > > 20611 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.close Closing > directory > > > server connection > > > > 20612 2020-10-14T16:00:14 FEIN [https-openssl-apr-8444-exec-166] > > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open Opening new > > > directory server connection > > > > 20613 2020-10-14T16:01:47 FEIN [https-openssl-apr-8444-exec-166] > > > net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getUser Searching for > > > username 'w99sezx0' in base ... > > > > > > As you can see it took 90 seconds to server the request because the > > > connection has expired and close took way too long. In average the > > > request takes: > > > > 2020-10-14T13:57:06.730 10.81.50.232 osipovmi@... "GET > > > /x2tc-proxy-bln/rest/info/targetSystem HTTP/1.1" 200 8 70 > > > > > > when the connection is healthy. > > > > > > > Ok, so there's some real incentive to avoid reusing a connection that was > > idle for too long. > > I made further analysis. I was partially wrong about my statement. The > cause > for the 90 s was a faulty KDC which did not respond in time for a service > ticket. Java Kerberos does 3 retries with 30 s timeout. I have set to 1 > retry > and 1000 ms wait until the next KDC will be tried. > > Anyways, I am still convinced that some idle timeout is the right choice to > avoid resource depletion on both sides. If hundreds of clients keep tens of > connections open to a directory server for no reaon sooner or later > everything > will stall. I will let you know in a couple of weeks. > Yes, no problem, it sounds like a good idea (with a good default). > > But related to this, I have made a very interesting observation when the > webapp > is shutdown: > > 2020-10-15T14:24:15.120 WARNUNG [deblndw024v...-startStop-2] > org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The > web application [x2tc-proxy-dsv] appears to have started a thread named > [Thread-7] but has failed to stop it. This is very likely to create a > memory leak. Stack trace of thread: > > java.lang.Object.wait(Native Method) > > java.lang.Object.wait(Object.java:502) > > com.sun.jndi.ldap.Connection.pauseReader(Connection.java:804) > > com.sun.jndi.ldap.Connection.run(Connection.java:944) > > java.lang.Thread.run(Thread.java:748) > > When you look how LdapCtx#close() is implemented, you'll see that it holds > a > reference count to all naming enumrations. As long as they aren't really > closed > the connection is not closed immediately. I need to analyze why this is > happening. The result could be stopping and starting an application could > drown > the entire VM sooner or later. > That's not nice at all. I suppose it means the context classloader must be switched when opening the ldap connection or it could leak. This is of course going to be less an issue with a single connection. I'll need to fix it. Rémy