daguimu opened a new issue, #10214: URL: https://github.com/apache/rocketmq/issues/10214
### Before Creating the Bug Report - [x] I found a bug, not just asking a question, which should be created in [GitHub Discussions](https://github.com/apache/rocketmq/discussions). - [x] I have searched the [GitHub Issues](https://github.com/apache/rocketmq/issues) and [GitHub Discussions](https://github.com/apache/rocketmq/discussions) of this repository and believe that this is not a duplicate. - [x] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ. ### Runtime platform environment All platforms ### RocketMQ version develop branch (latest) ### JDK Version JDK 8+ ### Describe the Bug In `MQClientInstance.java`, the `brokerVersionTable` (a `ConcurrentHashMap`) is accessed using a non-atomic check-then-act pattern in multiple locations: **Location 1 - `sendHeartbeatToBroker()` (line ~657):** ```java if (!this.brokerVersionTable.containsKey(brokerName)) { this.brokerVersionTable.put(brokerName, new ConcurrentHashMap<>(4)); } this.brokerVersionTable.get(brokerName).put(addr, version); ``` **Location 2 - `sendHeartbeatToAllBrokerV2()` (line ~737):** Same pattern as Location 1. **Location 3 - `findBrokerVersion()` (line ~1304):** ```java if (this.brokerVersionTable.containsKey(brokerName)) { if (this.brokerVersionTable.get(brokerName).containsKey(brokerAddr)) { return this.brokerVersionTable.get(brokerName).get(brokerAddr); } } ``` Between `containsKey()` and `get()`, another thread could remove the entry, causing a NullPointerException when the result of `get()` is dereferenced. Notably, a similar operation in the same class at line ~814 already uses the correct pattern: ```java ConcurrentHashMap<String, Integer> inner = MQClientInstance.this.brokerVersionTable .computeIfAbsent(brokerName, k -> new ConcurrentHashMap<>(4)); inner.put(brokerAddr, version); ``` ### Steps to Reproduce The race condition occurs under concurrent heartbeat sending and broker disconnection. When a broker is removed from the version table concurrently with a heartbeat send or version lookup, a NullPointerException can occur. ### What Did You Expect to See? Thread-safe access to `brokerVersionTable` using atomic operations like `computeIfAbsent` and local variable caching. ### What Did You See Instead? Non-atomic check-then-act pattern that can result in NullPointerException under concurrent access. ### Additional Context Related to previously reported #8743 (closed by stale bot without fix). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
