Re: RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v4]

Volker Simonis Fri, 09 Feb 2024 06:09:55 -0800

On Fri, 2 Feb 2024 17:38:13 GMT, Joshua Cao <[email protected]> wrote:


>> This change mirrors what we did for ConcurrentHashMap in 
>> https://github.com/openjdk/jdk/pull/17116. When we add all entries from one 
>> map to anther, we should resize that map to the size of the sum of both maps.
>> 
>> I used the command below to run the benchmarks. I set a high heap to reduce 
>> garbage collection noise.
>> 
>> java -Xms25G -jar benchmarks.jar -p size=100000 -p addSize=100000 -gc true 
>> org.openjdk.bench.java.util.HashMapBench
>> 
>> 
>> Before change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   
>> Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  22.927 ± 
>> 3.170  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  25.198 ± 
>> 2.189  ms/op
>> 
>> 
>> After change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   
>> Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  16.780 ± 
>> 0.526  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  19.721 ± 
>> 0.349  ms/op
>> 
>> 
>> We see about average time improvements of 26% in HashMap and 20% in 
>> LinkedHashMap.
>> 
>> ---
>> 
>> In the worse case, we may have two maps with identical keys. In this case, 
>> we would aggressively resize when we do not need to. I'm also adding an 
>> additional `putAllSameKeys` benchmark.
>> 
>> Before change:
>> 
>> 
>> Benchmark                                       (addSize)        (mapType)  
>> (size)  Mode  Cnt        Score   Error   Units
>> HashMapBench.putAllSameKeys                        100000         HASH_MAP  
>> 100000  avgt             6.956           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000         HASH_MAP  
>> 100000  avgt          1091.383          MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm     100000         HASH_MAP  
>> 100000  avgt       7968871.917            B/op
>> HashMapBench.putAllSameKeys:gc.count               100000         HASH_MAP  
>> 100000  avgt               ≈ 0          counts
>> HashMapBench.putAllSameKeys                        100000  LINKED_HASH_MAP  
>> 100000  avgt             8.417           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000  LINKED_HASH_MAP  
>> 100000  avgt           992.543          MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm     100000  LINKED_HASH_MAP  
>> 100000  avgt       8768892.941            B/op
>> HashMapBench.putAllSameKeys:gc.count               100000  LINKED_HASH_MAP  
>> 100000  avgt               ≈ 0          counts
>> 
>> 
>> Af...
>
> Joshua Cao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   extract msize variable

I agree that a comment would be helpful but I'm afraid that it will not be of 
much help if its buried in the code. So instead maybe better add an `@implNote` 
to the `putAll()` method which describes the current, conservative resizing 
heuristic and give the advice that users can presize the destination map 
accordingly if they want to avoid resizing when merging in other maps.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1935992895

Re: RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v4]

Reply via email to