On 2020-04-14 14:06, Eirik Bjørsnøs wrote:
    I wonder if the overhead you remove here is mainly avoiding the need to
    call byte[] bname = zc.getBytes(name); during lookup.


If I got my numbers right, 67% of the saved time was due to less executions of this method.

The remaining 33% should be explained by the bloom filter providing a faster negative lookup than the hash table.

    If we refactored ZipCoder to have a zc.getHash(name) method and
    getEntryPos to decode to a byte[] only lazily, we ought to be able to
    get most of the speed-up for the miss case, and none (or very little)
    added cost on hits. And no added footprint for a bloom filter.

    What do you think?


Looks like a nice way to get 67% of the savings without the extra footprint (which is ~10 bits per entry IIRC).

Right, not sure the extra footprint (and complexity) would be worth that additional saving.


The read and write sides would need to agree on their hash functions though. The read side is looking on strings and the write side on byte arrays. So you have the same problem I have in my current patch. How would you calculate the hash value?

Yeah, the String encoder APIs doesn't lend themselves well to do hash
calculations over a String without taking the overhead of allocating
the byte[] (and ByteBuffer, and ...), and we obviously need to ensure
we use effectively the same hash function on both ends.

One trick is to exploit that the standard UTF-8 encoding is ASCII
compatible (all chars 0-127 will encode unchanged), so we can
speculatively assume the String is ASCII and calculate the hash code
directly using a charAt loop - and return -1 to mark that the String
wasn't ASCII and needs to be encoded. The false positive case when hash
actually is -1 will be handled gracefully:

http://cr.openjdk.java.net/~redestad/scratch/getEntry_ascii_fastpath.00/

Almost all entries in common libraries are likely to have names which
are ASCII-compat, so this should enable the speed-up for most cases with
little complexity.

/Claes


Eirik.

Reply via email to