[
https://issues.apache.org/jira/browse/CAMEL-23686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086422#comment-18086422
]
Guillaume Nodet commented on CAMEL-23686:
-----------------------------------------
_Claude Code on behalf of Guillaume Nodet_
h3. Final results: CAMEL-23686 PRs + pre-existing optimization PRs
All measurements: JDK Temurin 21.0.9, Camel 4.21.0-SNAPSHOT, timer+split(1000)
routes, 60s runs, heap histogram at 35s.
h4. PRs in this effort
||PR||Description||Memory||Speed||
|[#23738|https://github.com/apache/camel/pull/23738]|CoW headers (copy-on-write
message headers)|saves full map copy per Exchange.copy()|copyFrom() shares a
reference instead of copying all entries|
|[#23766|https://github.com/apache/camel/pull/23766]|O(1) CaseInsensitiveMap
(hash table replaces TreeMap)|same size|O(1) vs O(log n) for every
getHeader/setHeader|
|[#23769|https://github.com/apache/camel/pull/23769]|Fix dev profile overriding
user properties|fixes 236 MB leak with pooled exchanges|—|
|[#23770|https://github.com/apache/camel/pull/23770]|Fix clock reset on wrong
exchange|bug fix|—|
|[#23771|https://github.com/apache/camel/pull/23771]|Lighten UoW (field + lazy
ArrayDeque)|-72B/exchange|direct field access vs ConcurrentLinkedDeque.peek()
volatile traversal|
|[#23779|https://github.com/apache/camel/pull/23779]|Optimized ASCII hash for
CaseInsensitiveMap|—|c+=32 for A-Z vs
Character.toLowerCase(Character.toUpperCase(c)) per char|
|[#23784|https://github.com/apache/camel/pull/23784]|Lazy getHeader() (skip
empty map creation)|-200B/exchange when no headers|avoids allocating 4 arrays +
isEmpty() just to return null|
|[#23794|https://github.com/apache/camel/pull/23794]|Inline
ExtendedExchangeExtension|-80B/exchange|getExchangeExtension() returns this
(monomorphic, JIT-inlineable) vs indirection|
|[#23796|https://github.com/apache/camel/pull/23796]|FlatMap for exchange
properties|-64B/exchange|linear scan of 4-8 slots beats hash + volatile reads +
Node pointer chasing|
h4. Per-exchange allocation
||Metric||Before||After||
|Bytes per exchange|~600|~470|
|Objects per exchange|12+|8-9|
|Baseline heap (prod+pooled)|73 MB|61 MB (-16%)|
h4. Objects eliminated from hot path
||Object||Before||After||
|ExtendedExchangeExtension (80B)|925,867|ELIMINATED (#23794)|
|ConcurrentLinkedDeque + Nodes (72B)|1,572,294|ELIMINATED (#23771)|
|ConcurrentHashMap (64B)|325,552|ELIMINATED (#23796)|
|DefaultMessageHistory (dev mode leak)|3,616,340|ELIMINATED (#23769)|
|MonotonicClock (dev mode leak)|3,616,340|ELIMINATED (#23769)|
h4. Bugs fixed
# Dev profile forced messageHistory=true ignoring user properties — caused 236
MB leak with pooled exchanges (#23769)
# PooledProcessorExchangeFactory.createCopy/createCorrelatedCopy reset the
original exchange's clock instead of the copy's (#23770)
> Reduce Exchange memory pressure and fix pooled exchange issues
> --------------------------------------------------------------
>
> Key: CAMEL-23686
> URL: https://issues.apache.org/jira/browse/CAMEL-23686
> Project: Camel
> Issue Type: Improvement
> Components: camel-core
> Reporter: Guillaume Nodet
> Assignee: Guillaume Nodet
> Priority: Major
> Labels: memory, performance
>
> h2. Context
> Profiling Camel routes under high throughput (timer + splitter producing ~1M
> msg/s) reveals several memory and allocation inefficiencies in the Exchange
> lifecycle. This issue tracks concrete improvements identified through heap
> histogram analysis and JFR profiling.
> h2. Findings
> h3. 1. Dev profile forces messageHistory=true, overriding user properties
> (CRITICAL)
> {{ProfileConfigurer.configureCommon()}} (line 102) unconditionally sets
> {{messageHistory=true}} in dev mode. This overrides any user setting of
> {{camel.main.messageHistory=false}} in application.properties, since the
> profile configurer runs after property loading.
> *Impact:* With pooled exchanges in dev mode, {{DefaultMessageHistory}}
> instances accumulate unbounded — in our test, 2.95M instances consumed 236MB,
> ballooning heap from 75MB to 696MB. The history list grows because pooled
> exchanges recycle the exchange object but the debugger re-creates message
> history entries on each reuse.
> h3. 2. Exchange pooling only covers consumer exchanges (~40%)
> The {{PooledExchangeFactory}} only provides pooled exchanges for the
> consumer's initial exchange. Sub-exchanges created by Splitter, Multicast,
> and RecipientList use regular {{DefaultExchange}} instances.
> In a pipeline route with a splitter, 524K out of 1.23M exchanges were pooled
> (42%). The remaining 703K (58%) were regular {{DefaultExchange}} instances,
> bypassing the pool entirely.
> h3. 3. Per-exchange allocation is ~600 bytes across 10-12 objects
> Each exchange allocates:
> ||Object||Bytes||
> |DefaultExchange / DefaultPooledExchange|64-80|
> |ExtendedExchangeExtension|80|
> |EnumMap x2 (properties + internal)|80|
> |DefaultMessage|48|
> |CopyOnWriteHeadersMap|24|
> |CaseInsensitiveMap|48|
> |DefaultUnitOfWork|56|
> |ReentrantLock + NonfairSync|48|
> |ConcurrentLinkedDeque (routes)|24|
> |*Total*|*~552 bytes*|
> At 1M exchanges/s, this generates ~552MB/s allocation rate just for exchange
> infrastructure.
> h3. 4. UnitOfWork is overweight for common single-route exchanges
> * {{ConcurrentLinkedDeque<Route> routes}} — eagerly allocated, but typically
> holds only 1 entry. A simple field with lazy upgrade to deque would save
> allocation.
> * {{ReentrantLock}} — allocated per UoW even though most exchanges are
> single-threaded. Could be lazily created only when threading is detected.
> h3. 5. ExtendedExchangeExtension always allocated
> {{ExtendedExchangeExtension}} (80 bytes) is created for every exchange, even
> though most exchanges never use extended features. Lazy initialization would
> save 80 bytes per exchange.
> h2. Test Environment
> * JDK: Temurin 21.0.9
> * Camel: 4.21.0-SNAPSHOT (with PR #23738 CoW headers + PR #23766 O(1)
> CaseInsensitiveMap)
> * Route: Timer(period=1) -> Split(1000 tokens) -> Direct -> CBR -> Direct ->
> mock
> * Duration: 60s, heap histogram captured at 35s
> h2. Benchmark Results
> ||Route||Profile||Heap Used||Metaspace||Threads||
> |Baseline|prod + pooled|75 MB|42 MB|34|
> |Baseline|dev + pooled|696 MB|43 MB|35|
> |Pipeline|dev default|3,194 MB|45 MB|35|
> |Pipeline|prod + pooled|1,270 MB|45 MB|34|
> |HTTP|prod + pooled|95 MB|56 MB|58|
> h2. Positive findings
> * PR #23738 (CopyOnWriteHeadersMap) is working correctly — header copies are
> avoided
> * PR #23766 (O(1) CaseInsensitiveMap) is active — TreeMap entries seen in
> histograms are from JMX infrastructure, not headers
> * HTTP component adds a fixed 24 threads + 14MB metaspace (per-component, not
> per-exchange)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)