No problem.  Good luck to you Bram!  Looks like a fun mystery.

On Tue, Mar 24, 2026 at 11:34 AM Bram Luyten <[email protected]> wrote:
>
> Hi David,
>
> Thank you for the detailed response. I owe you an apology: after
> re-examining our data based on your feedback, the wall-clock profiling
> led us to an incorrect attribution. Sorry for the noise.
>
> You were right to question the wall-clock numbers for background
> threads. When we re-checked with CPU profiling (async-profiler -e cpu),
> AdaptiveExecutionStrategy.produce() shows exactly 0 CPU samples on the
> Solr 10 branch. The selector thread is idle, not busy-polling.
> Wall-clock profiling inflated it because it samples all threads
> regardless of state. Total CPU samples are nearly identical between
> branches (17,519 vs 17,119), same distribution.
>
> To answer your question: we create exactly one CoreContainer for the
> entire test suite, held as a static singleton with 6 cores. Between
> tests we clear data via deleteByQuery + commit, but the container stays
> alive for the full JVM lifetime. So the "lots of CoreContainers"
> scenario does not apply here.
>
> Given identical CPU profiles and zero Jetty CPU samples, the Solr path
> is almost certainly not our bottleneck. We will look elsewhere. I don't
> think the SolrStartup benchmark would be productive at this point.
>
> Again, apologies for the false alarm, and thank you for steering us in
> the right direction.
>
> Best regards,
> Bram Luyten
>
> On Tue, Mar 24, 2026 at 2:43 PM David Smiley <[email protected]> wrote:
>
> > Hello Bram,
> >
> > Some of what you are sharing confuses me.  I don't think sharing the
> > wall-clock-time is pertinent for background threads -- and I assume
> > those Jetty HttpClients are in the background doing nothing. Yes,
> > CoreContainer creates a Jetty HttpClient that is unused in an embedded
> > mode.  Curious; are you creating lots of CoreContainers (perhaps
> > indirectly via creating EmbeddedSolrServer)?  Maybe we have a
> > regression there.  I suspect a test environment would be doing this,
> > creating a CoreContainer for each test, basically.  Solr's tests do
> > this too!  And a slowdown as big as you show sounds like something
> > we'd notice... most likely.  On the other hand, if your CI/tests
> > creates very few CoreContainers and there's all this slowdown you
> > report, then CoreContainer startup is mostly irrelevant.
> >
> > We do have a benchmark that should capture a slowdown in this area --
> >
> > https://github.com/apache/solr/blob/9c911e7337cd1026accc1a825e26906039982328/solr/benchmark/src/java/org/apache/solr/bench/lifecycle/SolrStartup.java
> > (scope is a bit larger but good enough) but we don't have continuous
> > benchmarking over releases to make relative comparisons.  We've been
> > talking about that, but the recent discussions are unlikely to support
> > a way to do this for embedded Solr.  I've been working on this
> > benchmark code lately as well.  *Anyway*, I recommend that you try
> > this benchmark, starting with its great README, mostly documenting JMH
> > itself.  If you do that and find some curious/suspicious things, I'd
> > love to hear more!
> >
> > On Tue, Mar 24, 2026 at 3:51 AM Bram Luyten <[email protected]>
> > wrote:
> > >
> > > Hi all,
> > >
> > > Disclaimer: I am a DSpace developer, not a Solr/Jetty internals
> > > expert. Much of the profiling and analysis below was done with heavy
> > > assistance from Claude. I'm sharing this because the data seems
> > > significant,
> > > but I may be misinterpreting some of it. Corrections and guidance are
> > very
> > > welcome.
> > >
> > >
> > > CONTEXT
> > > ---------------
> > >
> > > We are upgrading DSpace (open-source repository software) from
> > > Spring Boot 3 / Solr 8 to Spring Boot 4 / Solr 10. Our integration
> > > test suite uses embedded Solr via solr-core as a test dependency
> > > (EmbeddedSolrServer style, no HTTP traffic -- everything is
> > > in-process in a single JVM).
> > >
> > > After the upgrade, our IT suite went from ~31 minutes to ~2 hours
> > > in CI. We spent considerable time profiling and eliminating other
> > > causes (Hibernate 7, Spring 7, H2 database, GC, lock contention,
> > > caching). Wall-clock profiling with async-profiler ultimately
> > > pointed to embedded Solr as the primary bottleneck.
> > >
> > > Note: we previously reported the Solr 10 POM issue with missing
> > > Jackson 2 dependency versions (solr-core, solr-solrj, solr-api).
> > > We have the workaround in place (explicit dependency declarations),
> > > so the embedded Solr 10 has a complete classpath.
> > >
> > >
> > > THE PROBLEM
> > > ----------------------
> > >
> > > Wall-clock profiling (async-profiler -e wall) of the same test class
> > > (DiscoveryRestControllerIT, 83 tests) on both branches shows:
> > >
> > >   Component        Main (Solr 8)    SB4 (Solr 10)    Difference
> > >   ----------------------------------------------------------------
> > >   Solr total          3.6s            11.5s            +7.9s
> > >   Hibernate           0.2s             0.2s             0.0s
> > >   H2 Database         0.1s             0.1s             0.0s
> > >   Spring              0.1s             0.1s             0.0s
> > >   Test total         68.4s            84.3s           +15.9s
> > >
> > > Solr accounts for 50% of the total wall-clock difference (7.9s out
> > > of 15.9s). Hibernate, H2, and Spring are essentially unchanged.
> > >
> > >
> > > THE ROOT CAUSE
> > > ---------------------------
> > >
> > > Breaking down the Solr wall-clock time by operation:
> > >
> > >   Operation                                    Main        SB4
> > >   ---------------------------------------------------------------
> > >   Jetty EatWhatYouKill.produce()              2558 (58%)     --
> > >   Jetty AdaptiveExecutionStrategy.produce()     --        12786 (91%)
> > >   DirectUpdateHandler2.commit()                522 (12%)    707  (5%)
> > >   SpellChecker.newSearcher()                   119  (3%)    261  (2%)
> > >
> > >   (Numbers are async-profiler wall-clock samples)
> > >
> > > The dominant operation is Jetty's NIO selector execution strategy:
> > >
> > >   - Solr 8 / Jetty 9: EatWhatYouKill.produce(): 2558 samples (58%)
> > >   - Solr 10 / Jetty 12: AdaptiveExecutionStrategy.produce(): 12786
> > samples
> > > (91%)
> > >   - That is a 5x increase in wall-clock time
> > >
> > > The full stack trace shows:
> > >
> > >   ThreadPoolExecutor
> > >     -> MDCAwareThreadPoolExecutor
> > >       -> ManagedSelector (Jetty NIO selector)
> > >         -> AdaptiveExecutionStrategy.produce()
> > >           -> AdaptiveExecutionStrategy.tryProduce()
> > >             -> AdaptiveExecutionStrategy.produceTask()
> > >               -> ... -> KQueue.poll (macOS NIO)
> > >
> > > This is the Jetty HTTP client's NIO event loop. Even though we use
> > > EmbeddedSolrServer (no HTTP traffic), Solr 10's CoreContainer
> > > appears to create an internal Jetty HTTP client (likely for
> > > inter-shard communication via HttpJettySolrClient). In embedded
> > > single-node mode, this client has no work to do, but its NIO
> > > selector thread still runs, and AdaptiveExecutionStrategy.produce()
> > > idles much less efficiently than Jetty 9's EatWhatYouKill did.
> > >
> > > On macOS this manifests as busy-polling in KQueue.poll. The impact
> > > may differ on Linux (epoll).
> > >
> > >
> > > PROFILING METHODOLOGY
> > > -----------------------------------------
> > >
> > >   - Tool: async-profiler 4.3 (wall-clock mode, safepoint-free)
> > >   - JDK: OpenJDK 21.0.9
> > >   - Both branches use the same H2 2.4.240 test database
> > >   - Both branches use the same test code and Solr schema/config
> > >   - The only Solr-related difference is the Solr version (8.11.4 vs
> > 10.0.0)
> > >   - Profiling was done on macOS (Apple Silicon), but the CI slowdown
> > >     (GitHub Actions, Ubuntu) shows the same pattern at larger scale
> > >
> > >
> > > WHAT WE RULED OUT
> > > ---------------------------------
> > >
> > > Before identifying the Solr/Jetty issue, we investigated and ruled
> > > out many other causes:
> > >
> > >   - Hibernate 7 overhead: SQL query count is similar (fewer on SB4),
> > >     query execution time is <40ms total for 1400+ queries
> > >   - H2 database: same version (2.4.240) on both branches, negligible
> > >     wall-clock difference
> > >   - GC pauses: only +0.7s extra on SB4 (1.4% of total difference)
> > >   - Lock contention: main actually has MORE lock contention than SB4
> > >   - Hibernate session.clear(): tested with/without, no effect
> > >   - JaCoCo coverage: tested with/without, no effect
> > >   - Hibernate caching (L2, query cache): disabled both, no effect
> > >   - Hibernate batch fetch size: tested, no effect
> > >
> > >
> > > QUESTIONS FOR THE SOLR TEAM
> > > --------------------------------------------------
> > >
> > > 1. Does embedded mode (EmbeddedSolrServer / CoreContainer without
> > >    an HTTP listener) need to create a Jetty HTTP client at all?
> > >    If the client is only for shard-to-shard communication, it
> > >    seems unnecessary in single-node embedded testing.
> > >
> > > 2. If the HTTP client is required, can its NIO selector / thread
> > >    pool be configured with minimal resources for embedded mode?
> > >    (e.g., fewer selector threads, smaller thread pool, or an
> > >    idle-friendly execution strategy)
> > >
> > > 3. Is there a Solr configuration (solr.xml property, system
> > >    property, or CoreContainer API) that we can use from the
> > >    consuming application to reduce this overhead?
> > >
> > > 4. Is this specific to macOS (KQueue) or does it also affect
> > >    Linux (epoll)? Our CI runs on Ubuntu and shows a larger
> > >    slowdown (3.8x) than local macOS (1.28x), which could be
> > >    related.
> > >
> > > ENVIRONMENT
> > > -----------------------
> > >
> > >   Solr: 10.0.0 (solr-core as test dependency for embedded server)
> > >   Jetty: 12.0.x (pulled in transitively by Solr 10)
> > >   JDK: 21
> > >   OS: macOS (profiled), Ubuntu (CI where the 4x slowdown manifests)
> > >   Project: DSpace (https://github.com/DSpace/DSpace)
> > >   PR: https://github.com/DSpace/DSpace/pull/11810
> > >
> > > Happy to provide the full async-profiler flame graph files or
> > > additional profiling data if useful.
> > >
> > > Thanks,
> > > Bram Luyten, Atmire
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to