Hello Bram,

Some of what you are sharing confuses me.  I don't think sharing the
wall-clock-time is pertinent for background threads -- and I assume
those Jetty HttpClients are in the background doing nothing. Yes,
CoreContainer creates a Jetty HttpClient that is unused in an embedded
mode.  Curious; are you creating lots of CoreContainers (perhaps
indirectly via creating EmbeddedSolrServer)?  Maybe we have a
regression there.  I suspect a test environment would be doing this,
creating a CoreContainer for each test, basically.  Solr's tests do
this too!  And a slowdown as big as you show sounds like something
we'd notice... most likely.  On the other hand, if your CI/tests
creates very few CoreContainers and there's all this slowdown you
report, then CoreContainer startup is mostly irrelevant.

We do have a benchmark that should capture a slowdown in this area --
https://github.com/apache/solr/blob/9c911e7337cd1026accc1a825e26906039982328/solr/benchmark/src/java/org/apache/solr/bench/lifecycle/SolrStartup.java
(scope is a bit larger but good enough) but we don't have continuous
benchmarking over releases to make relative comparisons.  We've been
talking about that, but the recent discussions are unlikely to support
a way to do this for embedded Solr.  I've been working on this
benchmark code lately as well.  *Anyway*, I recommend that you try
this benchmark, starting with its great README, mostly documenting JMH
itself.  If you do that and find some curious/suspicious things, I'd
love to hear more!

On Tue, Mar 24, 2026 at 3:51 AM Bram Luyten <[email protected]> wrote:
>
> Hi all,
>
> Disclaimer: I am a DSpace developer, not a Solr/Jetty internals
> expert. Much of the profiling and analysis below was done with heavy
> assistance from Claude. I'm sharing this because the data seems
> significant,
> but I may be misinterpreting some of it. Corrections and guidance are very
> welcome.
>
>
> CONTEXT
> ---------------
>
> We are upgrading DSpace (open-source repository software) from
> Spring Boot 3 / Solr 8 to Spring Boot 4 / Solr 10. Our integration
> test suite uses embedded Solr via solr-core as a test dependency
> (EmbeddedSolrServer style, no HTTP traffic -- everything is
> in-process in a single JVM).
>
> After the upgrade, our IT suite went from ~31 minutes to ~2 hours
> in CI. We spent considerable time profiling and eliminating other
> causes (Hibernate 7, Spring 7, H2 database, GC, lock contention,
> caching). Wall-clock profiling with async-profiler ultimately
> pointed to embedded Solr as the primary bottleneck.
>
> Note: we previously reported the Solr 10 POM issue with missing
> Jackson 2 dependency versions (solr-core, solr-solrj, solr-api).
> We have the workaround in place (explicit dependency declarations),
> so the embedded Solr 10 has a complete classpath.
>
>
> THE PROBLEM
> ----------------------
>
> Wall-clock profiling (async-profiler -e wall) of the same test class
> (DiscoveryRestControllerIT, 83 tests) on both branches shows:
>
>   Component        Main (Solr 8)    SB4 (Solr 10)    Difference
>   ----------------------------------------------------------------
>   Solr total          3.6s            11.5s            +7.9s
>   Hibernate           0.2s             0.2s             0.0s
>   H2 Database         0.1s             0.1s             0.0s
>   Spring              0.1s             0.1s             0.0s
>   Test total         68.4s            84.3s           +15.9s
>
> Solr accounts for 50% of the total wall-clock difference (7.9s out
> of 15.9s). Hibernate, H2, and Spring are essentially unchanged.
>
>
> THE ROOT CAUSE
> ---------------------------
>
> Breaking down the Solr wall-clock time by operation:
>
>   Operation                                    Main        SB4
>   ---------------------------------------------------------------
>   Jetty EatWhatYouKill.produce()              2558 (58%)     --
>   Jetty AdaptiveExecutionStrategy.produce()     --        12786 (91%)
>   DirectUpdateHandler2.commit()                522 (12%)    707  (5%)
>   SpellChecker.newSearcher()                   119  (3%)    261  (2%)
>
>   (Numbers are async-profiler wall-clock samples)
>
> The dominant operation is Jetty's NIO selector execution strategy:
>
>   - Solr 8 / Jetty 9: EatWhatYouKill.produce(): 2558 samples (58%)
>   - Solr 10 / Jetty 12: AdaptiveExecutionStrategy.produce(): 12786 samples
> (91%)
>   - That is a 5x increase in wall-clock time
>
> The full stack trace shows:
>
>   ThreadPoolExecutor
>     -> MDCAwareThreadPoolExecutor
>       -> ManagedSelector (Jetty NIO selector)
>         -> AdaptiveExecutionStrategy.produce()
>           -> AdaptiveExecutionStrategy.tryProduce()
>             -> AdaptiveExecutionStrategy.produceTask()
>               -> ... -> KQueue.poll (macOS NIO)
>
> This is the Jetty HTTP client's NIO event loop. Even though we use
> EmbeddedSolrServer (no HTTP traffic), Solr 10's CoreContainer
> appears to create an internal Jetty HTTP client (likely for
> inter-shard communication via HttpJettySolrClient). In embedded
> single-node mode, this client has no work to do, but its NIO
> selector thread still runs, and AdaptiveExecutionStrategy.produce()
> idles much less efficiently than Jetty 9's EatWhatYouKill did.
>
> On macOS this manifests as busy-polling in KQueue.poll. The impact
> may differ on Linux (epoll).
>
>
> PROFILING METHODOLOGY
> -----------------------------------------
>
>   - Tool: async-profiler 4.3 (wall-clock mode, safepoint-free)
>   - JDK: OpenJDK 21.0.9
>   - Both branches use the same H2 2.4.240 test database
>   - Both branches use the same test code and Solr schema/config
>   - The only Solr-related difference is the Solr version (8.11.4 vs 10.0.0)
>   - Profiling was done on macOS (Apple Silicon), but the CI slowdown
>     (GitHub Actions, Ubuntu) shows the same pattern at larger scale
>
>
> WHAT WE RULED OUT
> ---------------------------------
>
> Before identifying the Solr/Jetty issue, we investigated and ruled
> out many other causes:
>
>   - Hibernate 7 overhead: SQL query count is similar (fewer on SB4),
>     query execution time is <40ms total for 1400+ queries
>   - H2 database: same version (2.4.240) on both branches, negligible
>     wall-clock difference
>   - GC pauses: only +0.7s extra on SB4 (1.4% of total difference)
>   - Lock contention: main actually has MORE lock contention than SB4
>   - Hibernate session.clear(): tested with/without, no effect
>   - JaCoCo coverage: tested with/without, no effect
>   - Hibernate caching (L2, query cache): disabled both, no effect
>   - Hibernate batch fetch size: tested, no effect
>
>
> QUESTIONS FOR THE SOLR TEAM
> --------------------------------------------------
>
> 1. Does embedded mode (EmbeddedSolrServer / CoreContainer without
>    an HTTP listener) need to create a Jetty HTTP client at all?
>    If the client is only for shard-to-shard communication, it
>    seems unnecessary in single-node embedded testing.
>
> 2. If the HTTP client is required, can its NIO selector / thread
>    pool be configured with minimal resources for embedded mode?
>    (e.g., fewer selector threads, smaller thread pool, or an
>    idle-friendly execution strategy)
>
> 3. Is there a Solr configuration (solr.xml property, system
>    property, or CoreContainer API) that we can use from the
>    consuming application to reduce this overhead?
>
> 4. Is this specific to macOS (KQueue) or does it also affect
>    Linux (epoll)? Our CI runs on Ubuntu and shows a larger
>    slowdown (3.8x) than local macOS (1.28x), which could be
>    related.
>
> ENVIRONMENT
> -----------------------
>
>   Solr: 10.0.0 (solr-core as test dependency for embedded server)
>   Jetty: 12.0.x (pulled in transitively by Solr 10)
>   JDK: 21
>   OS: macOS (profiled), Ubuntu (CI where the 4x slowdown manifests)
>   Project: DSpace (https://github.com/DSpace/DSpace)
>   PR: https://github.com/DSpace/DSpace/pull/11810
>
> Happy to provide the full async-profiler flame graph files or
> additional profiling data if useful.
>
> Thanks,
> Bram Luyten, Atmire

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to