daguimu opened a new issue, #10214:
URL: https://github.com/apache/rocketmq/issues/10214

   ### Before Creating the Bug Report
   
   - [x] I found a bug, not just asking a question, which should be created in 
[GitHub Discussions](https://github.com/apache/rocketmq/discussions).
   
   - [x] I have searched the [GitHub 
Issues](https://github.com/apache/rocketmq/issues) and [GitHub 
Discussions](https://github.com/apache/rocketmq/discussions)  of this 
repository and believe that this is not a duplicate.
   
   - [x] I have confirmed that this bug belongs to the current repository, not 
other repositories of RocketMQ.
   
   
   ### Runtime platform environment
   
   All platforms
   
   ### RocketMQ version
   
   develop branch (latest)
   
   ### JDK Version
   
   JDK 8+
   
   ### Describe the Bug
   
   In `MQClientInstance.java`, the `brokerVersionTable` (a `ConcurrentHashMap`) 
is accessed using a non-atomic check-then-act pattern in multiple locations:
   
   **Location 1 - `sendHeartbeatToBroker()` (line ~657):**
   ```java
   if (!this.brokerVersionTable.containsKey(brokerName)) {
       this.brokerVersionTable.put(brokerName, new ConcurrentHashMap<>(4));
   }
   this.brokerVersionTable.get(brokerName).put(addr, version);
   ```
   
   **Location 2 - `sendHeartbeatToAllBrokerV2()` (line ~737):**
   Same pattern as Location 1.
   
   **Location 3 - `findBrokerVersion()` (line ~1304):**
   ```java
   if (this.brokerVersionTable.containsKey(brokerName)) {
       if (this.brokerVersionTable.get(brokerName).containsKey(brokerAddr)) {
           return this.brokerVersionTable.get(brokerName).get(brokerAddr);
       }
   }
   ```
   
   Between `containsKey()` and `get()`, another thread could remove the entry, 
causing a NullPointerException when the result of `get()` is dereferenced.
   
   Notably, a similar operation in the same class at line ~814 already uses the 
correct pattern:
   ```java
   ConcurrentHashMap<String, Integer> inner = 
MQClientInstance.this.brokerVersionTable
       .computeIfAbsent(brokerName, k -> new ConcurrentHashMap<>(4));
   inner.put(brokerAddr, version);
   ```
   
   ### Steps to Reproduce
   
   The race condition occurs under concurrent heartbeat sending and broker 
disconnection. When a broker is removed from the version table concurrently 
with a heartbeat send or version lookup, a NullPointerException can occur.
   
   ### What Did You Expect to See?
   
   Thread-safe access to `brokerVersionTable` using atomic operations like 
`computeIfAbsent` and local variable caching.
   
   ### What Did You See Instead?
   
   Non-atomic check-then-act pattern that can result in NullPointerException 
under concurrent access.
   
   ### Additional Context
   
   Related to previously reported #8743 (closed by stale bot without fix).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to