On Wed, 24 Jan 2024 19:06:49 GMT, Volker Simonis <simo...@openjdk.org> wrote:

> The current benchmark and the change don't really cover the case where many 
> keys exist in _both_ maps. Could you add a benchmark for that?

I added a benchmark that assumes the worse case. Please see the top post. Yes, 
this change is not always beneficial. I'd like to think the original benchmark, 
which uses random keys, is closer to real life scenarios. There may be some 
duplicate keys, but most keys are unique. But this is just my intuition, I do 
not have data to back this up.

One solution would be to loop through both maps and count number of unique 
keys. My guess is that this would mostly add unnecessary computations. We can 
compromise with something like `size() / 2 + m.size()`. My guess is the current 
aggressive approach is the best most of the time, but I don't have strong 
opinions and am happy to take suggestions.

> Also `int s = size() + m.size();` can overflow now, leading to different 
> behavior. It might make sense to just cap the value?

Is it necessary? We can go with @simonis 's suggestion. I think if we move 
forward with this, it would make sense to change all instances of `size` to 
long type, which would then suggest we should change the `Map::size()` api to 
return a long. The reason I am not concerned about overflow is that the Map API 
implies that we should only care about maps with at most 2^32 items.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1908899013

Reply via email to