wang-jiahua opened a new issue, #10525:
URL: https://github.com/apache/rocketmq/issues/10525

   ### Before Creating the Enhancement Request
   
   - [x] I have confirmed that this should be classified as an enhancement 
rather than a bug/feature.
   
   ### Summary
   
   Reduce allocation in the pull/dispatch path by replacing boxed collections 
with primitive arrays, reusing DispatchRequest via ThreadLocal, merging mapped 
file slices, and eliminating CompletableFuture callback lambdas.
   
   ### Motivation
   
   JFR profiling on the broker pull/dispatch path reveals several per-message 
allocation hotspots:
   
   1. **`GetMessageResult`** — stored message offsets as `List<Long>`, boxing 
every `long` into a `Long` object. Under high pull QPS, this creates thousands 
of short-lived `Long` objects and `ArrayList` resize overhead per second.
   
   2. **`DispatchRequest`** — a new `DispatchRequest` object is created for 
every message dispatched to ConsumeQueue/IndexService/TimerWheel. The object 
has mutable fields that could be reset and reused via ThreadLocal.
   
   3. **`DefaultMappedFile.selectMappedBuffer`** — creates two separate 
`ByteBuffer` slices for position+size, then wraps them. Can be merged into a 
single slice operation.
   
   4. **`DefaultMessageStore.putMessage/putMessages`** — wraps 
`asyncPutMessage` result in a `thenAccept` lambda callback for stats logging. 
The lambda captures `this` and `beginTime`, creating a closure object per 
message.
   
   ### Describe the Solution You'd Like
   
   1. `GetMessageResult`: replace `List<Long>` with `long[]` + add 
`addQueueOffset(long)` method. Right-size initial capacity with constructor 
parameter.
   2. `DispatchRequest`: change `final` fields to mutable + add `reset()` 
method for ThreadLocal reuse.
   3. `DefaultMappedFile`: merge dual-slice into single `selectMappedBuffer` 
operation with cached append slice.
   4. `DefaultMessageStore`: remove `thenAccept` callback, inline stats logging 
into `CommitLog` or caller.
   5. `ConsumeQueue`: make `topicQueueKey` a `final` field to avoid per-call 
computation.
   
   ### Describe Alternatives You've Considered
   
   - Use `LongAdder` instead of `long[]` for offsets — not applicable, offsets 
need ordering.
   - Keep `thenAccept` callback but use a static method reference — still 
captures `this`, doesn't eliminate allocation.
   - Use object pool instead of ThreadLocal for DispatchRequest — ThreadLocal 
is simpler and sufficient for single-threaded dispatch.
   
   ### Additional Context
   
   Part of a larger JFR-driven optimization effort. Related PRs: #10443, 
#10444, #10514, #10524.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to