Aleksey Shipilev wrote:
Hi again, Tim.

So I spent another day for this issue. I've gathered the profile of
SPECjbb2005 and grepped out HashMap methods (okay, I had to disable
inline, so exact numbers differ from actual performance run):

Thanks for spending time to look into the issue Aleksey, it is much appreciated.

Current implementation:

6.99% HashMap.findNonNullKeyEntry(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
0.61% HashMap.getEntry(Ljava/lang/Object;)Ljava/util/HashMap$Entry;
0.25% HashMap.get(Ljava/lang/Object;)Ljava/lang/Object; 
---------------
7.86% Total
                        
H5374:                  

6.01% 
HashMap.findNonNullKeyEntryInteger(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
0.67% 
HashMap.findNonNullKeyEntryLegacy(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
0.61% HashMap.getEntry(Ljava/lang/Object;)Ljava/util/HashMap$Entry;
0.42% HashMap.get(Ljava/lang/Object;)Ljava/lang/Object; 
0.39% HashMap.findNonNullKeyEntry(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
---------------
7.05% Total

Percents are clocktick percents of entire workload.
So, profile shows that H5374 code is actually faster.

Then after talk with Sergey Kuksenko (that's a credit to him :)) I
tried to compare these two implementations without allocPrefetch,
which prefetches the memory for newly created objects and thus
inferring high cache pressure. allocPrefetch itself gives hu-u-uge
boosts, but can expose cache limitations for other optimizations. So,
with allocPrefetch disabled:

Windows x86
100.0% Harmony-clean
101.1% Harmony + H5374

Windows x86_64
100.0% Harmony-clean
100.5% Harmony + H5374

That's the boost I'm looking for! I wonder why such positive change as
manual unboxing changes L2 cache access patterns so it gives boosts in
normal mode and degradation in presence of high L2 cache user.

I had also remeasured all modes accurately, so let's have the
conclusion on this issue:

Windows x86:
 100.0% [base] Harmony-clean
 100.2% [+0.2%] Harmony-clean + H5374
 88.6%   [base] Harmony-clean - allocPrefetch
 89.6%   [+1%] Harmony-clean - allocPrefetch + H5374

Windows x86_64:
 100.0% [base] Harmony-clean
 100.1% [+0.1%] Harmony-clean + H5374
 88.9%   [base] Harmony-clean - allocPrefetch
 89.3%   [+0.5%] Harmony-clean - allocPrefetch + H5374

...measurement uncertainty is about 0.4%.

Basing on this data I would say this patch couldn't get much boost on
DRLVM, since DRLVM's optimizations do their job of scalarization just
fine. The patch should also increase cache locality and it seems to be
the case in absence of another L2 cache contributor. Let's add that
such specialization bloats code a little, and jump to conclusion that
from DRLVM side it would be better to keep patch out of trunk.

Fair enough (though it looks like a minor improvement, right?).
I'm happy to leave the patch out.

Can I go back a moment to hear about the scalar replacement technique in Jitrino? Feel free to point me to some doc or code if that is easier.

As you know, my goal was to avoid the key dereferencing when searching the hashmap by, as you say, unboxing the Integer and encoding the value in the hashcode int field. The key field is still an object ptr to the original Integer object which is required for answering the keySet etc.

So how does Jitrino both unbox the primitive and preserve the 'box' for when it must be returned? [If you see what I mean, otherwise I'll try and rephrase it]

There is one more possible opportunity - to tune up prefetch distance
in allocPrefetch, but that's a fragile thing to optimize.

Yeah, but no need to perform unnatural acts. We can leave it out if there is no benefit to Harmony.

Regards,
Tim

Reply via email to