On Fri, Feb 9, 2024 at 12:46 PM Mark Thomas <ma...@apache.org> wrote:
>
> On 08/02/2024 17:07, Mark Thomas wrote:
> > Hi all,
> >
> > TL;DR tagging likely delayed while APR/native stability issue is addressed
> >
> > We have had a couple of issues with test stability in the last few days.
> >
> > The issues with 11.0.x and 10.1.x were caused by the incomplete
> > convenience binary for Tomcat Native 2.0.7. That should be resolved now.
> > The 11.0.x tests are already green and I am expecting 10.1.x to be green
> > for the next run.
> >
> > 9.0.x and 8.5.x are a little more interesting. The instability was
> > triggered by the change to allow users to provide an SSLContext directly
> > to SSLHostConfigCertificate. This changed the timing of endpoint
> > destruction enough to make the intermittent APR crashes much more
> > frequent - almost on every run.
> >
> > The good news is that the more frequent crashes made it easier to
> > investigate. My current theory is related to the cleanup of
> > OpenSSLContext. In 9.0.x and 8.5.x clean-up of this object is performed
> > by a finalizer. This is to support runtime replacement of the
> > SSLHostContext.
> >
> > What I think happens is:
> > - Tomcat starts shutdown
> > - Endpoint is destroyed
> > - AprLifecycleListener shuts down Native library
> > - finalizer runs and tries to reference native code leading to a crash
> >
> > I have some initial ideas on how we might handle this better. The quick
> > and dirty fix was to force GC and add a delay in
> > AprLifecycleListener.terminateAPR() but that was just a hack to test the
> > theory.
> >
> > Back to working out a more robust fix...
>
> While the fix worked well locally, it hasn't fixed the problem on the
> Buildbot CI worker.
>
> I'm going to take another look.

I had a look at the test output, and the issue is exclusively with the
APR connector (the tests are a bit weird so that the APR connector is
also run, basically the test is the same for all connectors), that's
why it would only affect 8.5 and 9.0. Overall it's not even certain
OpenSSL + NIO really needed a fix and the OpenSSLContext cleanup is
most likely good enough (at least in that case). Build 845 for 9.0.x
was crashing when using the APR connector, not NIO.

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to