BewareMyPower opened a new issue, #23215:
URL: https://github.com/apache/pulsar/issues/23215

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   There was a discussion at Sep. 2023 before to replace Customized Map with 
ConcurrentOpenHashMap. In this issue, I'd focus on the `ConcurrentOpenHashMap`.
   
   Here is the only advantage of this map.
   
   > Provides similar methods as a {@code ConcurrentMap<K,V>} but since it's an 
open hash map with linear probing,  no node allocations are required to store 
the values.
   
   Let me count the disadvantages.
   
   ### 1. Bad performance
   
   This map was aded in the initial commit (on 2016). However, this 
implementation was just based on the Java 7 implementation of the 
`ConcurrentHashMap`, which uses a segment based lock. Actually, this solution 
was discarded by the Java team since Java 8.
   
   https://github.com/apache/pulsar/pull/20647/#issuecomment-1607257960 did a 
benchmark and found the performance was much worse than the current 
`ConcurrentHashMap` provided by Java library. We can also search the PROs of 
the Java 8 design in network, or just ask for ChatGPT.
   
   Besides, the frequently used `keys()` and `values()` methods just copy the 
keys and values to a new list. While the `ConcurrentHashMap` just returns a 
thread-safe internal view that users can choose whether to make a copy.
   
   Anyway, to prove the performance is worse than `ConcurrentHashMap`, we need 
to have more tests and research. So it's the least important reason.
   
   ### 2. Lack of the updates
   
   This class was rarely updated. What I can remember is the shrink support two 
years ago. https://github.com/apache/pulsar/pull/14663
   
   From https://github.com/apache/bookkeeper/pull/3061, we can see the 
motivation is the frequently appeared Full GC caused by this implementation. 
However, adding a `shrink` method makes it harder to use. There are already 
many parameters to tune, see it's builder:
   
   ```java
       public static class Builder<K, V> {
           int expectedItems = DefaultExpectedItems;
           int concurrencyLevel = DefaultConcurrencyLevel;
           float mapFillFactor = DefaultMapFillFactor;
           float mapIdleFactor = DefaultMapIdleFactor;
           float expandFactor = DefaultExpandFactor;
           float shrinkFactor = DefaultShrinkFactor;
           boolean autoShrink = DefaultAutoShrink;
   ```
   
   Many `xxxFactor`s and the concurrency level. It's hard to determine a proper 
value by default. However, it makes new developers hard to modify it.
   
   ### 3. Bad debug experience
   
   When I debugged the topics maintained in a `BrokerService`.
   
   <img width="1031" alt="image" 
src="https://github.com/user-attachments/assets/5dfb13d9-7897-429e-ab06-4888595a06c1";>
   
   As you can see. There are 16 sections. And I have to iterate over all these 
sections and expand the `table` array to find the target topic.
   
   Let's compare it with the official `ConcurrentHashMap` (I replaced it 
locally)
   
   <img width="1211" alt="image" 
src="https://github.com/user-attachments/assets/49fbb5ba-7cb0-4225-922f-737af90ef5a7";>
   
   ### 4. Not friendly to new developers
   
   Many places just use it as a concurrent hash map. **What's the reason for 
new developers to not use the official `ConcurrentHashMap`, which is developed 
and consistently improved by a professional team?** Just to reduce the node 
allocation? With the improving JVM GC?
   
   As I've mentioned, this class might be introduced at the Java 7 era. Now the 
minimum required Java version of broker side is 17. We have ZGC. We have 
Shenandoah GC. We have many more JVM developers developing better GC. I'm 
suspecting if the advantage makes sense.
   
   I cannot think of a reason to choose this hard-to-maintain class rather than 
well-maintained official `ConcurrentHashMap`.
   
   For example, when I maintained a KoP, I encountered the deadlock of 
`ConcurrentLongHashMap` (maybe the similar implementation). 
https://github.com/streamnative/kop/pull/620 And it's hard to know if this case 
is fixed. So I have to switch to the official `ConcurrentHashMap`.
   
   ### Solution
   
   Replace `ConcurrentOpenHashMap` with the official Java `ConcurrentHashMap`.
   
   ### Alternatives
   
   N/A
   
   ### Anything else?
   
   N/A
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to