On Mon, 14 Nov 2022 18:28:53 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:
>>> Also, I'd like to note that C2 auto-vectorization support is not too far >>> away from being able to optimize hash code computations. At some point, I >>> was able to achieve some promising results with modest tweaking of >>> SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt >>> http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ >> >> Intriguing. How far off is this - and do you think it'll be able to match >> the efficiency we see here with a memoized coefficient table etc? >> >> If we turn this intrinsic into a stub we might also be able to reuse the >> optimization in other places, including from within the VM (calculating >> String hashCodes happen in a couple of places, including String >> deduplication). So I think there are still a few compelling reasons to go >> the manual route and continue on this path. > >> How far off is this ...? > > Back then it looked way too constrained (tight constraints on code shapes). > But I considered it as a generally applicable optimization. > >> ... do you think it'll be able to match the efficiency we see here with a >> memoized coefficient table etc? > > Yes, it is able to build the constant table at runtime when folding > multiplications of constant coefficients produced during loop unrolling and > then packing scalars into a constant vector. > > Moreover, briefly looking at the code shape, the vectorizer would produce a > more optimal loop shape (pre-loop would align vector accesses and would use > 512-bit vectors when available; vector post-loop could help as well). Passing the constant node through as an input as suggested by @iwanowww and @sviswa7 meant we could eliminate most of the `instruct` blocks, removing a significant chunk of code and a little bit of complexity from the proposed patch. ------------- PR: https://git.openjdk.org/jdk/pull/10847