wang-jiahua opened a new issue, #10525: URL: https://github.com/apache/rocketmq/issues/10525
### Before Creating the Enhancement Request - [x] I have confirmed that this should be classified as an enhancement rather than a bug/feature. ### Summary Reduce allocation in the pull/dispatch path by replacing boxed collections with primitive arrays, reusing DispatchRequest via ThreadLocal, merging mapped file slices, and eliminating CompletableFuture callback lambdas. ### Motivation JFR profiling on the broker pull/dispatch path reveals several per-message allocation hotspots: 1. **`GetMessageResult`** — stored message offsets as `List<Long>`, boxing every `long` into a `Long` object. Under high pull QPS, this creates thousands of short-lived `Long` objects and `ArrayList` resize overhead per second. 2. **`DispatchRequest`** — a new `DispatchRequest` object is created for every message dispatched to ConsumeQueue/IndexService/TimerWheel. The object has mutable fields that could be reset and reused via ThreadLocal. 3. **`DefaultMappedFile.selectMappedBuffer`** — creates two separate `ByteBuffer` slices for position+size, then wraps them. Can be merged into a single slice operation. 4. **`DefaultMessageStore.putMessage/putMessages`** — wraps `asyncPutMessage` result in a `thenAccept` lambda callback for stats logging. The lambda captures `this` and `beginTime`, creating a closure object per message. ### Describe the Solution You'd Like 1. `GetMessageResult`: replace `List<Long>` with `long[]` + add `addQueueOffset(long)` method. Right-size initial capacity with constructor parameter. 2. `DispatchRequest`: change `final` fields to mutable + add `reset()` method for ThreadLocal reuse. 3. `DefaultMappedFile`: merge dual-slice into single `selectMappedBuffer` operation with cached append slice. 4. `DefaultMessageStore`: remove `thenAccept` callback, inline stats logging into `CommitLog` or caller. 5. `ConsumeQueue`: make `topicQueueKey` a `final` field to avoid per-call computation. ### Describe Alternatives You've Considered - Use `LongAdder` instead of `long[]` for offsets — not applicable, offsets need ordering. - Keep `thenAccept` callback but use a static method reference — still captures `this`, doesn't eliminate allocation. - Use object pool instead of ThreadLocal for DispatchRequest — ThreadLocal is simpler and sufficient for single-threaded dispatch. ### Additional Context Part of a larger JFR-driven optimization effort. Related PRs: #10443, #10444, #10514, #10524. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
