Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]

Ludovic Henry Thu, 12 May 2022 04:12:23 -0700

On Wed, 11 May 2022 19:13:55 GMT, Paul Sandoz <psan...@openjdk.org> wrote:


>> Ludovic Henry has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 18 commits:
>> 
>>  - Fix overlapping registers
>>  - Actually fix h when hashcode is vectorized
>>  - Merge branch 'master' of https://github.com/openjdk/jdk into 
>> vectorized-stringlatin1-hashcode
>>  - Fix h when vectorized for Arrays.hashCode
>>  - Add missing check for AryHashCode node
>>  - Fix some merge conflicts
>>  - Disable Arrays.hashCode intrinsic by default for CI
>>  - Merge branch 'master' of https://github.com/openjdk/jdk into 
>> vectorized-stringlatin1-hashcode
>>  - Some small refactoring: store power_of_31_backwards in the code directly, 
>> compact code, and more
>>  - {wip} Generalize string hashcode to Arrays.hashCode
>>  - ... and 8 more: 
>> https://git.openjdk.java.net/jdk/compare/3fa1c404...29dab16b
>
> Looks like you are making great progress.
> 
> Have you thought about ways the intrinsic implementation might be simplified 
> if some code is retained in Java and passed as constant arguments? e.g. table 
> of constants, scalar loop, bounds checks etc, such that the intrinsic 
> primarily focuses on the vectorized code. To some extent that's related to 
> John's point on generalization, and through simplification there may be some 
> generalization.
> 
> For example if there was a general intrinsic that returned a long value (e.g. 
> first 32 bits are the offset in the array to continue processing, the second 
> 32 bits are the current hashcode value) then we could call that from the Java 
> implementations that then proceed with the scalar loop up to the array 
> length. The Java implementation intrinsic would return `(0L | 1L << 32)`.
> 
> Separately it would be nice to consider computing the hash code from the 
> contents of a memory segment, similar to how we added `mismatch` support, but 
> the trick of returning a value that merges the offset and hash code would not 
> work, unless we return the remaining elements to process and that guaranteed 
> to be less than a certain value (or alternatively a relative offset value 
> given an upper bound argument, and performing the intrinsic call in a loop 
> until no progress can be made, which works better for safepoints).
> 
> The `long[]` hashcode is annoying given `(element ^ (element >>> 32))`, but 
> if we simplify the intrinsic maybe we can add back that complexity?

@PaulSandoz yes, keeping the "short" string part in pure Java and switching to 
an intrinsified/vectorized version for "long" strings is on the next avenue of 
exploration. I would also put the intrinsic as a runtime stub to avoid 
unnecessarily increase the size of the calling method unnecessarily. The 
overhead of the call would be amortised because it would only be called for 
longer strings anyway.

I haven't given much thoughts to how we could split up the different elements 
of the algorithm to generalise the approach just yet. I'll give it a try, see 
how far I can get with it, and keep you updated on my findings.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7700

Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]

Reply via email to