[ 
https://issues.apache.org/jira/browse/CAMEL-23686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086422#comment-18086422
 ] 

Guillaume Nodet commented on CAMEL-23686:
-----------------------------------------

_Claude Code on behalf of Guillaume Nodet_

h3. Final results: CAMEL-23686 PRs + pre-existing optimization PRs

All measurements: JDK Temurin 21.0.9, Camel 4.21.0-SNAPSHOT, timer+split(1000) 
routes, 60s runs, heap histogram at 35s.

h4. PRs in this effort

||PR||Description||Memory||Speed||
|[#23738|https://github.com/apache/camel/pull/23738]|CoW headers (copy-on-write 
message headers)|saves full map copy per Exchange.copy()|copyFrom() shares a 
reference instead of copying all entries|
|[#23766|https://github.com/apache/camel/pull/23766]|O(1) CaseInsensitiveMap 
(hash table replaces TreeMap)|same size|O(1) vs O(log n) for every 
getHeader/setHeader|
|[#23769|https://github.com/apache/camel/pull/23769]|Fix dev profile overriding 
user properties|fixes 236 MB leak with pooled exchanges|—|
|[#23770|https://github.com/apache/camel/pull/23770]|Fix clock reset on wrong 
exchange|bug fix|—|
|[#23771|https://github.com/apache/camel/pull/23771]|Lighten UoW (field + lazy 
ArrayDeque)|-72B/exchange|direct field access vs ConcurrentLinkedDeque.peek() 
volatile traversal|
|[#23779|https://github.com/apache/camel/pull/23779]|Optimized ASCII hash for 
CaseInsensitiveMap|—|c+=32 for A-Z vs 
Character.toLowerCase(Character.toUpperCase(c)) per char|
|[#23784|https://github.com/apache/camel/pull/23784]|Lazy getHeader() (skip 
empty map creation)|-200B/exchange when no headers|avoids allocating 4 arrays + 
isEmpty() just to return null|
|[#23794|https://github.com/apache/camel/pull/23794]|Inline 
ExtendedExchangeExtension|-80B/exchange|getExchangeExtension() returns this 
(monomorphic, JIT-inlineable) vs indirection|
|[#23796|https://github.com/apache/camel/pull/23796]|FlatMap for exchange 
properties|-64B/exchange|linear scan of 4-8 slots beats hash + volatile reads + 
Node pointer chasing|

h4. Per-exchange allocation

||Metric||Before||After||
|Bytes per exchange|~600|~470|
|Objects per exchange|12+|8-9|
|Baseline heap (prod+pooled)|73 MB|61 MB (-16%)|

h4. Objects eliminated from hot path

||Object||Before||After||
|ExtendedExchangeExtension (80B)|925,867|ELIMINATED (#23794)|
|ConcurrentLinkedDeque + Nodes (72B)|1,572,294|ELIMINATED (#23771)|
|ConcurrentHashMap (64B)|325,552|ELIMINATED (#23796)|
|DefaultMessageHistory (dev mode leak)|3,616,340|ELIMINATED (#23769)|
|MonotonicClock (dev mode leak)|3,616,340|ELIMINATED (#23769)|

h4. Bugs fixed

# Dev profile forced messageHistory=true ignoring user properties — caused 236 
MB leak with pooled exchanges (#23769)
# PooledProcessorExchangeFactory.createCopy/createCorrelatedCopy reset the 
original exchange's clock instead of the copy's (#23770)

> Reduce Exchange memory pressure and fix pooled exchange issues
> --------------------------------------------------------------
>
>                 Key: CAMEL-23686
>                 URL: https://issues.apache.org/jira/browse/CAMEL-23686
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Guillaume Nodet
>            Assignee: Guillaume Nodet
>            Priority: Major
>              Labels: memory, performance
>
> h2. Context
> Profiling Camel routes under high throughput (timer + splitter producing ~1M 
> msg/s) reveals several memory and allocation inefficiencies in the Exchange 
> lifecycle. This issue tracks concrete improvements identified through heap 
> histogram analysis and JFR profiling.
> h2. Findings
> h3. 1. Dev profile forces messageHistory=true, overriding user properties 
> (CRITICAL)
> {{ProfileConfigurer.configureCommon()}} (line 102) unconditionally sets 
> {{messageHistory=true}} in dev mode. This overrides any user setting of 
> {{camel.main.messageHistory=false}} in application.properties, since the 
> profile configurer runs after property loading.
> *Impact:* With pooled exchanges in dev mode, {{DefaultMessageHistory}} 
> instances accumulate unbounded — in our test, 2.95M instances consumed 236MB, 
> ballooning heap from 75MB to 696MB. The history list grows because pooled 
> exchanges recycle the exchange object but the debugger re-creates message 
> history entries on each reuse.
> h3. 2. Exchange pooling only covers consumer exchanges (~40%)
> The {{PooledExchangeFactory}} only provides pooled exchanges for the 
> consumer's initial exchange. Sub-exchanges created by Splitter, Multicast, 
> and RecipientList use regular {{DefaultExchange}} instances.
> In a pipeline route with a splitter, 524K out of 1.23M exchanges were pooled 
> (42%). The remaining 703K (58%) were regular {{DefaultExchange}} instances, 
> bypassing the pool entirely.
> h3. 3. Per-exchange allocation is ~600 bytes across 10-12 objects
> Each exchange allocates:
> ||Object||Bytes||
> |DefaultExchange / DefaultPooledExchange|64-80|
> |ExtendedExchangeExtension|80|
> |EnumMap x2 (properties + internal)|80|
> |DefaultMessage|48|
> |CopyOnWriteHeadersMap|24|
> |CaseInsensitiveMap|48|
> |DefaultUnitOfWork|56|
> |ReentrantLock + NonfairSync|48|
> |ConcurrentLinkedDeque (routes)|24|
> |*Total*|*~552 bytes*|
> At 1M exchanges/s, this generates ~552MB/s allocation rate just for exchange 
> infrastructure.
> h3. 4. UnitOfWork is overweight for common single-route exchanges
> * {{ConcurrentLinkedDeque<Route> routes}} — eagerly allocated, but typically 
> holds only 1 entry. A simple field with lazy upgrade to deque would save 
> allocation.
> * {{ReentrantLock}} — allocated per UoW even though most exchanges are 
> single-threaded. Could be lazily created only when threading is detected.
> h3. 5. ExtendedExchangeExtension always allocated
> {{ExtendedExchangeExtension}} (80 bytes) is created for every exchange, even 
> though most exchanges never use extended features. Lazy initialization would 
> save 80 bytes per exchange.
> h2. Test Environment
> * JDK: Temurin 21.0.9
> * Camel: 4.21.0-SNAPSHOT (with PR #23738 CoW headers + PR #23766 O(1) 
> CaseInsensitiveMap)
> * Route: Timer(period=1) -> Split(1000 tokens) -> Direct -> CBR -> Direct -> 
> mock
> * Duration: 60s, heap histogram captured at 35s
> h2. Benchmark Results
> ||Route||Profile||Heap Used||Metaspace||Threads||
> |Baseline|prod + pooled|75 MB|42 MB|34|
> |Baseline|dev + pooled|696 MB|43 MB|35|
> |Pipeline|dev default|3,194 MB|45 MB|35|
> |Pipeline|prod + pooled|1,270 MB|45 MB|34|
> |HTTP|prod + pooled|95 MB|56 MB|58|
> h2. Positive findings
> * PR #23738 (CopyOnWriteHeadersMap) is working correctly — header copies are 
> avoided
> * PR #23766 (O(1) CaseInsensitiveMap) is active — TreeMap entries seen in 
> histograms are from JMX infrastructure, not headers
> * HTTP component adds a fixed 24 threads + 14MB metaspace (per-component, not 
> per-exchange)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to