Hi David,

Thank you for the detailed response. I owe you an apology: after
re-examining our data based on your feedback, the wall-clock profiling
led us to an incorrect attribution. Sorry for the noise.

You were right to question the wall-clock numbers for background
threads. When we re-checked with CPU profiling (async-profiler -e cpu),
AdaptiveExecutionStrategy.produce() shows exactly 0 CPU samples on the
Solr 10 branch. The selector thread is idle, not busy-polling.
Wall-clock profiling inflated it because it samples all threads
regardless of state. Total CPU samples are nearly identical between
branches (17,519 vs 17,119), same distribution.

To answer your question: we create exactly one CoreContainer for the
entire test suite, held as a static singleton with 6 cores. Between
tests we clear data via deleteByQuery + commit, but the container stays
alive for the full JVM lifetime. So the "lots of CoreContainers"
scenario does not apply here.

Given identical CPU profiles and zero Jetty CPU samples, the Solr path
is almost certainly not our bottleneck. We will look elsewhere. I don't
think the SolrStartup benchmark would be productive at this point.

Again, apologies for the false alarm, and thank you for steering us in
the right direction.

Best regards,
Bram Luyten

On Tue, Mar 24, 2026 at 2:43 PM David Smiley <[email protected]> wrote:

> Hello Bram,
>
> Some of what you are sharing confuses me.  I don't think sharing the
> wall-clock-time is pertinent for background threads -- and I assume
> those Jetty HttpClients are in the background doing nothing. Yes,
> CoreContainer creates a Jetty HttpClient that is unused in an embedded
> mode.  Curious; are you creating lots of CoreContainers (perhaps
> indirectly via creating EmbeddedSolrServer)?  Maybe we have a
> regression there.  I suspect a test environment would be doing this,
> creating a CoreContainer for each test, basically.  Solr's tests do
> this too!  And a slowdown as big as you show sounds like something
> we'd notice... most likely.  On the other hand, if your CI/tests
> creates very few CoreContainers and there's all this slowdown you
> report, then CoreContainer startup is mostly irrelevant.
>
> We do have a benchmark that should capture a slowdown in this area --
>
> https://github.com/apache/solr/blob/9c911e7337cd1026accc1a825e26906039982328/solr/benchmark/src/java/org/apache/solr/bench/lifecycle/SolrStartup.java
> (scope is a bit larger but good enough) but we don't have continuous
> benchmarking over releases to make relative comparisons.  We've been
> talking about that, but the recent discussions are unlikely to support
> a way to do this for embedded Solr.  I've been working on this
> benchmark code lately as well.  *Anyway*, I recommend that you try
> this benchmark, starting with its great README, mostly documenting JMH
> itself.  If you do that and find some curious/suspicious things, I'd
> love to hear more!
>
> On Tue, Mar 24, 2026 at 3:51 AM Bram Luyten <[email protected]>
> wrote:
> >
> > Hi all,
> >
> > Disclaimer: I am a DSpace developer, not a Solr/Jetty internals
> > expert. Much of the profiling and analysis below was done with heavy
> > assistance from Claude. I'm sharing this because the data seems
> > significant,
> > but I may be misinterpreting some of it. Corrections and guidance are
> very
> > welcome.
> >
> >
> > CONTEXT
> > ---------------
> >
> > We are upgrading DSpace (open-source repository software) from
> > Spring Boot 3 / Solr 8 to Spring Boot 4 / Solr 10. Our integration
> > test suite uses embedded Solr via solr-core as a test dependency
> > (EmbeddedSolrServer style, no HTTP traffic -- everything is
> > in-process in a single JVM).
> >
> > After the upgrade, our IT suite went from ~31 minutes to ~2 hours
> > in CI. We spent considerable time profiling and eliminating other
> > causes (Hibernate 7, Spring 7, H2 database, GC, lock contention,
> > caching). Wall-clock profiling with async-profiler ultimately
> > pointed to embedded Solr as the primary bottleneck.
> >
> > Note: we previously reported the Solr 10 POM issue with missing
> > Jackson 2 dependency versions (solr-core, solr-solrj, solr-api).
> > We have the workaround in place (explicit dependency declarations),
> > so the embedded Solr 10 has a complete classpath.
> >
> >
> > THE PROBLEM
> > ----------------------
> >
> > Wall-clock profiling (async-profiler -e wall) of the same test class
> > (DiscoveryRestControllerIT, 83 tests) on both branches shows:
> >
> >   Component        Main (Solr 8)    SB4 (Solr 10)    Difference
> >   ----------------------------------------------------------------
> >   Solr total          3.6s            11.5s            +7.9s
> >   Hibernate           0.2s             0.2s             0.0s
> >   H2 Database         0.1s             0.1s             0.0s
> >   Spring              0.1s             0.1s             0.0s
> >   Test total         68.4s            84.3s           +15.9s
> >
> > Solr accounts for 50% of the total wall-clock difference (7.9s out
> > of 15.9s). Hibernate, H2, and Spring are essentially unchanged.
> >
> >
> > THE ROOT CAUSE
> > ---------------------------
> >
> > Breaking down the Solr wall-clock time by operation:
> >
> >   Operation                                    Main        SB4
> >   ---------------------------------------------------------------
> >   Jetty EatWhatYouKill.produce()              2558 (58%)     --
> >   Jetty AdaptiveExecutionStrategy.produce()     --        12786 (91%)
> >   DirectUpdateHandler2.commit()                522 (12%)    707  (5%)
> >   SpellChecker.newSearcher()                   119  (3%)    261  (2%)
> >
> >   (Numbers are async-profiler wall-clock samples)
> >
> > The dominant operation is Jetty's NIO selector execution strategy:
> >
> >   - Solr 8 / Jetty 9: EatWhatYouKill.produce(): 2558 samples (58%)
> >   - Solr 10 / Jetty 12: AdaptiveExecutionStrategy.produce(): 12786
> samples
> > (91%)
> >   - That is a 5x increase in wall-clock time
> >
> > The full stack trace shows:
> >
> >   ThreadPoolExecutor
> >     -> MDCAwareThreadPoolExecutor
> >       -> ManagedSelector (Jetty NIO selector)
> >         -> AdaptiveExecutionStrategy.produce()
> >           -> AdaptiveExecutionStrategy.tryProduce()
> >             -> AdaptiveExecutionStrategy.produceTask()
> >               -> ... -> KQueue.poll (macOS NIO)
> >
> > This is the Jetty HTTP client's NIO event loop. Even though we use
> > EmbeddedSolrServer (no HTTP traffic), Solr 10's CoreContainer
> > appears to create an internal Jetty HTTP client (likely for
> > inter-shard communication via HttpJettySolrClient). In embedded
> > single-node mode, this client has no work to do, but its NIO
> > selector thread still runs, and AdaptiveExecutionStrategy.produce()
> > idles much less efficiently than Jetty 9's EatWhatYouKill did.
> >
> > On macOS this manifests as busy-polling in KQueue.poll. The impact
> > may differ on Linux (epoll).
> >
> >
> > PROFILING METHODOLOGY
> > -----------------------------------------
> >
> >   - Tool: async-profiler 4.3 (wall-clock mode, safepoint-free)
> >   - JDK: OpenJDK 21.0.9
> >   - Both branches use the same H2 2.4.240 test database
> >   - Both branches use the same test code and Solr schema/config
> >   - The only Solr-related difference is the Solr version (8.11.4 vs
> 10.0.0)
> >   - Profiling was done on macOS (Apple Silicon), but the CI slowdown
> >     (GitHub Actions, Ubuntu) shows the same pattern at larger scale
> >
> >
> > WHAT WE RULED OUT
> > ---------------------------------
> >
> > Before identifying the Solr/Jetty issue, we investigated and ruled
> > out many other causes:
> >
> >   - Hibernate 7 overhead: SQL query count is similar (fewer on SB4),
> >     query execution time is <40ms total for 1400+ queries
> >   - H2 database: same version (2.4.240) on both branches, negligible
> >     wall-clock difference
> >   - GC pauses: only +0.7s extra on SB4 (1.4% of total difference)
> >   - Lock contention: main actually has MORE lock contention than SB4
> >   - Hibernate session.clear(): tested with/without, no effect
> >   - JaCoCo coverage: tested with/without, no effect
> >   - Hibernate caching (L2, query cache): disabled both, no effect
> >   - Hibernate batch fetch size: tested, no effect
> >
> >
> > QUESTIONS FOR THE SOLR TEAM
> > --------------------------------------------------
> >
> > 1. Does embedded mode (EmbeddedSolrServer / CoreContainer without
> >    an HTTP listener) need to create a Jetty HTTP client at all?
> >    If the client is only for shard-to-shard communication, it
> >    seems unnecessary in single-node embedded testing.
> >
> > 2. If the HTTP client is required, can its NIO selector / thread
> >    pool be configured with minimal resources for embedded mode?
> >    (e.g., fewer selector threads, smaller thread pool, or an
> >    idle-friendly execution strategy)
> >
> > 3. Is there a Solr configuration (solr.xml property, system
> >    property, or CoreContainer API) that we can use from the
> >    consuming application to reduce this overhead?
> >
> > 4. Is this specific to macOS (KQueue) or does it also affect
> >    Linux (epoll)? Our CI runs on Ubuntu and shows a larger
> >    slowdown (3.8x) than local macOS (1.28x), which could be
> >    related.
> >
> > ENVIRONMENT
> > -----------------------
> >
> >   Solr: 10.0.0 (solr-core as test dependency for embedded server)
> >   Jetty: 12.0.x (pulled in transitively by Solr 10)
> >   JDK: 21
> >   OS: macOS (profiled), Ubuntu (CI where the 4x slowdown manifests)
> >   Project: DSpace (https://github.com/DSpace/DSpace)
> >   PR: https://github.com/DSpace/DSpace/pull/11810
> >
> > Happy to provide the full async-profiler flame graph files or
> > additional profiling data if useful.
> >
> > Thanks,
> > Bram Luyten, Atmire
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to