Re: RFR: 8243469: Lazily encode name in ZipFile.getEntryPos

Claes Redestad Mon, 27 Apr 2020 08:31:44 -0700

On 2020-04-27 17:16, Lance Andersen wrote:

Hi Claes,
The changes and the performance bump all look good and with the minorchange below helps the readability.


Thanks! Pushed.

/Claes


Thank you for using your performance expertise to improve this area.

Best
Lance

On Apr 27, 2020, at 6:11 AM, Claes Redestad <[email protected]<mailto:[email protected]>> wrote:




On 2020-04-27 11:49, Volker Simonis wrote:

On Sun, Apr 26, 2020 at 11:34 PM Claes Redestad
<[email protected] <mailto:[email protected]>> wrote:


Hi again,

On 2020-04-24 21:22, Claes Redestad wrote:

It seems that 'getEntryHitUncached' is getting slightly slower with
your change while all the other variants get significantly faster. I
don't think that's a problem, but do you have an explanation why
that's the case?


I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting

the same bucket in the hash table. If that one has a collision,we'll do

a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.

I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.


after I improved my micro to root out sources of variance, the
performance issue for hits persisted.

Luckily Eirik had a brilliant idea: Why not decode the bytes in the
cen to a String and compare that, rather than the other way around?
To some surprise it turns out this gives us about a ~1.2x speedup for
getEntryHit and getEntryHitUncached over open.00 - and comfortably
just ahead of the baseline on getEntryHitUncached[1]. It also leads to
slightly cleaner code[2].

Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.01/

The speed-up appears to come from String.equals, which is intrinsified
and significantly faster than the replaced loop. I profiled allocation
per operation and it stays the same (EA removes the String).

Great! Another nice improvement. The changes look good to me.


Thanks!

Following just two minor remarks:
In ZipCoder.normalizedHashDecode() you've changed the line:
if (limit > 0 && decoded[limit - 1] != '/') {
to:
if (limit > pos && decoded[limit - 1] != '/') {
which was first a little confusing to me. But in the end it turns out
that this is semantically the same, because the
CharsetDecoder.decode() method called before is guaranteed to return a
"newly-allocated character buffer" and its "position will be zero and
its limit will follow the last character written". This also explains
why you don't have to take the CharBuffer's "arrayOffset()" into
account if you use the CharBuffer's backing array (because it will
always be 0 for newly created buffers). So maybe you can put in some
comments to make it less confusing for the ingenuous reader:
CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off, end - off));
// 'cb' is a newly allocated CharBuffer with 'pos == 0'
int pos = cb.position();
int limit = cb.limit();
char[] decoded;
if (cb.hasArray()) {
    // 'cb.arrayOffset()' is zero for newly allocated CharBuffers
    decoded = cb.array();
} else {
    decoded = new char[limit - pos];
    cb.get(decoded);
}
I think you can also remove the "else" branch (and maybe replace it
with an assertion) because newly allocated CharBuffers are guaranteed
to be backed by an array with array offset zero (see
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/nio/CharBuffer.html#allocate(int)
).


Yes, it does seem the specification is pretty strong here and we can
assume that pos == 0, arrayOffset == 0 and cb.hasArray() == true.

I'll simplify to:

           // 'cb' is a newly allocated CharBuffer with 'pos == 0',
           // 'arrayOffset == 0', backed by an array.

CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off,end - off));

           int limit = cb.limit();
           char[] decoded = cb.array();
           for (int i = 0; i < limit; i++) {
               h = 31 * h + decoded[i];
           }
           if (limit > 0 && decoded[limit - 1] != '/') {
               h = 31 * h + '/';
           }

An assert seems like overkill.

Zipcoder.get() seems to be the only remaining if block without braces.
Maybe you'll wnat to fix that once your on it?
    public static ZipCoder get(Charset charset) {
        if (charset == UTF_8.INSTANCE)
            return UTF8;
        return new ZipCoder(charset);


Sure.

/Claes

Thumbs up from my side. There's no need for a new webrev from my side.
Best regards,
Volker

Testing: tier1-4

Thanks!

/Claes

[1]
Baseline:

Benchmark (size) Mode Cnt Score ErrorUnits

ZipFileGetEntry.getEntryHit              512  avgt   15  126.264 ± 5.297
  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15  130.823 ± 7.212
  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  152.149 ± 4.978
  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  151.527 ± 4.054
  ns/op

open.01:
Benchmark                             (size)  Mode  Cnt    Score   Error
  Units
ZipFileGetEntry.getEntryHit              512  avgt   15   84.450 ± 5.474
  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15   85.224 ± 3.776
  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  140.448 ± 4.667
  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  145.046 ± 7.363

[2] I stopped short of taking the cleanup a step further by decoding to
String even in initCEN, which sadly isn't performance neutral:

http://cr.openjdk.java.net/~redestad/8243469/open.01.init_decode/

Something for the future to consider, maybe.


<http://oracle.com/us/design/oracle-email-sig-198324.gif>
<http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>

<http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|Principal Member of Technical Staff | +1.781.442.2037

Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
[email protected] <mailto:[email protected]>

Re: RFR: 8243469: Lazily encode name in ZipFile.getEntryPos

Reply via email to