On 2020-04-27 17:16, Lance Andersen wrote:
Hi Claes,
The changes and the performance bump all look good and with the minor
change below helps the readability.
Thanks! Pushed.
/Claes
Thank you for using your performance expertise to improve this area.
Best
Lance
On Apr 27, 2020, at 6:11 AM, Claes Redestad <claes.redes...@oracle.com
<mailto:claes.redes...@oracle.com>> wrote:
On 2020-04-27 11:49, Volker Simonis wrote:
On Sun, Apr 26, 2020 at 11:34 PM Claes Redestad
<claes.redes...@oracle.com <mailto:claes.redes...@oracle.com>> wrote:
Hi again,
On 2020-04-24 21:22, Claes Redestad wrote:
It seems that 'getEntryHitUncached' is getting slightly slower with
your change while all the other variants get significantly faster. I
don't think that's a problem, but do you have an explanation why
that's the case?
I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting
the same bucket in the hash table. If that one has a collision,
we'll do
a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.
I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.
after I improved my micro to root out sources of variance, the
performance issue for hits persisted.
Luckily Eirik had a brilliant idea: Why not decode the bytes in the
cen to a String and compare that, rather than the other way around?
To some surprise it turns out this gives us about a ~1.2x speedup for
getEntryHit and getEntryHitUncached over open.00 - and comfortably
just ahead of the baseline on getEntryHitUncached[1]. It also leads to
slightly cleaner code[2].
Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.01/
The speed-up appears to come from String.equals, which is intrinsified
and significantly faster than the replaced loop. I profiled allocation
per operation and it stays the same (EA removes the String).
Great! Another nice improvement. The changes look good to me.
Thanks!
Following just two minor remarks:
In ZipCoder.normalizedHashDecode() you've changed the line:
if (limit > 0 && decoded[limit - 1] != '/') {
to:
if (limit > pos && decoded[limit - 1] != '/') {
which was first a little confusing to me. But in the end it turns out
that this is semantically the same, because the
CharsetDecoder.decode() method called before is guaranteed to return a
"newly-allocated character buffer" and its "position will be zero and
its limit will follow the last character written". This also explains
why you don't have to take the CharBuffer's "arrayOffset()" into
account if you use the CharBuffer's backing array (because it will
always be 0 for newly created buffers). So maybe you can put in some
comments to make it less confusing for the ingenuous reader:
CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off, end - off));
// 'cb' is a newly allocated CharBuffer with 'pos == 0'
int pos = cb.position();
int limit = cb.limit();
char[] decoded;
if (cb.hasArray()) {
// 'cb.arrayOffset()' is zero for newly allocated CharBuffers
decoded = cb.array();
} else {
decoded = new char[limit - pos];
cb.get(decoded);
}
I think you can also remove the "else" branch (and maybe replace it
with an assertion) because newly allocated CharBuffers are guaranteed
to be backed by an array with array offset zero (see
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/nio/CharBuffer.html#allocate(int)
).
Yes, it does seem the specification is pretty strong here and we can
assume that pos == 0, arrayOffset == 0 and cb.hasArray() == true.
I'll simplify to:
// 'cb' is a newly allocated CharBuffer with 'pos == 0',
// 'arrayOffset == 0', backed by an array.
CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off,
end - off));
int limit = cb.limit();
char[] decoded = cb.array();
for (int i = 0; i < limit; i++) {
h = 31 * h + decoded[i];
}
if (limit > 0 && decoded[limit - 1] != '/') {
h = 31 * h + '/';
}
An assert seems like overkill.
Zipcoder.get() seems to be the only remaining if block without braces.
Maybe you'll wnat to fix that once your on it?
public static ZipCoder get(Charset charset) {
if (charset == UTF_8.INSTANCE)
return UTF8;
return new ZipCoder(charset);
Sure.
/Claes
Thumbs up from my side. There's no need for a new webrev from my side.
Best regards,
Volker
Testing: tier1-4
Thanks!
/Claes
[1]
Baseline:
Benchmark (size) Mode Cnt Score Error
Units
ZipFileGetEntry.getEntryHit 512 avgt 15 126.264 ± 5.297
ns/op
ZipFileGetEntry.getEntryHit 1024 avgt 15 130.823 ± 7.212
ns/op
ZipFileGetEntry.getEntryHitUncached 512 avgt 15 152.149 ± 4.978
ns/op
ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 151.527 ± 4.054
ns/op
open.01:
Benchmark (size) Mode Cnt Score Error
Units
ZipFileGetEntry.getEntryHit 512 avgt 15 84.450 ± 5.474
ns/op
ZipFileGetEntry.getEntryHit 1024 avgt 15 85.224 ± 3.776
ns/op
ZipFileGetEntry.getEntryHitUncached 512 avgt 15 140.448 ± 4.667
ns/op
ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 145.046 ± 7.363
[2] I stopped short of taking the cleanup a step further by decoding to
String even in initCEN, which sadly isn't performance neutral:
http://cr.openjdk.java.net/~redestad/8243469/open.01.init_decode/
Something for the future to consider, maybe.
<http://oracle.com/us/design/oracle-email-sig-198324.gif>
<http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
<http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>