Hi all,
Disclaimer: I am a DSpace developer, not a Solr/Jetty internals
expert. Much of the profiling and analysis below was done with heavy
assistance from Claude. I'm sharing this because the data seems
significant,
but I may be misinterpreting some of it. Corrections and guidance are very
welcome.
CONTEXT
---------------
We are upgrading DSpace (open-source repository software) from
Spring Boot 3 / Solr 8 to Spring Boot 4 / Solr 10. Our integration
test suite uses embedded Solr via solr-core as a test dependency
(EmbeddedSolrServer style, no HTTP traffic -- everything is
in-process in a single JVM).
After the upgrade, our IT suite went from ~31 minutes to ~2 hours
in CI. We spent considerable time profiling and eliminating other
causes (Hibernate 7, Spring 7, H2 database, GC, lock contention,
caching). Wall-clock profiling with async-profiler ultimately
pointed to embedded Solr as the primary bottleneck.
Note: we previously reported the Solr 10 POM issue with missing
Jackson 2 dependency versions (solr-core, solr-solrj, solr-api).
We have the workaround in place (explicit dependency declarations),
so the embedded Solr 10 has a complete classpath.
THE PROBLEM
----------------------
Wall-clock profiling (async-profiler -e wall) of the same test class
(DiscoveryRestControllerIT, 83 tests) on both branches shows:
Component Main (Solr 8) SB4 (Solr 10) Difference
----------------------------------------------------------------
Solr total 3.6s 11.5s +7.9s
Hibernate 0.2s 0.2s 0.0s
H2 Database 0.1s 0.1s 0.0s
Spring 0.1s 0.1s 0.0s
Test total 68.4s 84.3s +15.9s
Solr accounts for 50% of the total wall-clock difference (7.9s out
of 15.9s). Hibernate, H2, and Spring are essentially unchanged.
THE ROOT CAUSE
---------------------------
Breaking down the Solr wall-clock time by operation:
Operation Main SB4
---------------------------------------------------------------
Jetty EatWhatYouKill.produce() 2558 (58%) --
Jetty AdaptiveExecutionStrategy.produce() -- 12786 (91%)
DirectUpdateHandler2.commit() 522 (12%) 707 (5%)
SpellChecker.newSearcher() 119 (3%) 261 (2%)
(Numbers are async-profiler wall-clock samples)
The dominant operation is Jetty's NIO selector execution strategy:
- Solr 8 / Jetty 9: EatWhatYouKill.produce(): 2558 samples (58%)
- Solr 10 / Jetty 12: AdaptiveExecutionStrategy.produce(): 12786 samples
(91%)
- That is a 5x increase in wall-clock time
The full stack trace shows:
ThreadPoolExecutor
-> MDCAwareThreadPoolExecutor
-> ManagedSelector (Jetty NIO selector)
-> AdaptiveExecutionStrategy.produce()
-> AdaptiveExecutionStrategy.tryProduce()
-> AdaptiveExecutionStrategy.produceTask()
-> ... -> KQueue.poll (macOS NIO)
This is the Jetty HTTP client's NIO event loop. Even though we use
EmbeddedSolrServer (no HTTP traffic), Solr 10's CoreContainer
appears to create an internal Jetty HTTP client (likely for
inter-shard communication via HttpJettySolrClient). In embedded
single-node mode, this client has no work to do, but its NIO
selector thread still runs, and AdaptiveExecutionStrategy.produce()
idles much less efficiently than Jetty 9's EatWhatYouKill did.
On macOS this manifests as busy-polling in KQueue.poll. The impact
may differ on Linux (epoll).
PROFILING METHODOLOGY
-----------------------------------------
- Tool: async-profiler 4.3 (wall-clock mode, safepoint-free)
- JDK: OpenJDK 21.0.9
- Both branches use the same H2 2.4.240 test database
- Both branches use the same test code and Solr schema/config
- The only Solr-related difference is the Solr version (8.11.4 vs 10.0.0)
- Profiling was done on macOS (Apple Silicon), but the CI slowdown
(GitHub Actions, Ubuntu) shows the same pattern at larger scale
WHAT WE RULED OUT
---------------------------------
Before identifying the Solr/Jetty issue, we investigated and ruled
out many other causes:
- Hibernate 7 overhead: SQL query count is similar (fewer on SB4),
query execution time is <40ms total for 1400+ queries
- H2 database: same version (2.4.240) on both branches, negligible
wall-clock difference
- GC pauses: only +0.7s extra on SB4 (1.4% of total difference)
- Lock contention: main actually has MORE lock contention than SB4
- Hibernate session.clear(): tested with/without, no effect
- JaCoCo coverage: tested with/without, no effect
- Hibernate caching (L2, query cache): disabled both, no effect
- Hibernate batch fetch size: tested, no effect
QUESTIONS FOR THE SOLR TEAM
--------------------------------------------------
1. Does embedded mode (EmbeddedSolrServer / CoreContainer without
an HTTP listener) need to create a Jetty HTTP client at all?
If the client is only for shard-to-shard communication, it
seems unnecessary in single-node embedded testing.
2. If the HTTP client is required, can its NIO selector / thread
pool be configured with minimal resources for embedded mode?
(e.g., fewer selector threads, smaller thread pool, or an
idle-friendly execution strategy)
3. Is there a Solr configuration (solr.xml property, system
property, or CoreContainer API) that we can use from the
consuming application to reduce this overhead?
4. Is this specific to macOS (KQueue) or does it also affect
Linux (epoll)? Our CI runs on Ubuntu and shows a larger
slowdown (3.8x) than local macOS (1.28x), which could be
related.
ENVIRONMENT
-----------------------
Solr: 10.0.0 (solr-core as test dependency for embedded server)
Jetty: 12.0.x (pulled in transitively by Solr 10)
JDK: 21
OS: macOS (profiled), Ubuntu (CI where the 4x slowdown manifests)
Project: DSpace (https://github.com/DSpace/DSpace)
PR: https://github.com/DSpace/DSpace/pull/11810
Happy to provide the full async-profiler flame graph files or
additional profiling data if useful.
Thanks,
Bram Luyten, Atmire