On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> Ping again~ could any one please take a look at this PR? Thanks a lot! > >> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing >> and get back. >> >> Do you have any idea about following regression? >> >> ``` >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 64 55844.814 48311.847 0.86 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 256 15139.459 13009.848 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 1024 3861.834 3284.944 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 4096 938.665 817.673 0.87 >> ``` >> >> Best Regards > > Yes, I also observed such regression. After analyzing, I found it was caused > by the java side changes, which influences the range check elimination inside > `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The > root cause is the counted loop in following benchmark case is not recognized > by compiler as expected: > > public void microByteGather256() { > for (int i = 0; i < SIZE; i += B256.length()) { > ByteVector.fromArray(B256, barr, 0, index, i) > .intoArray(bres, i); > } > } > ``` > The loop iv phi node is not recognized successfully when C2 recognize the > counted loop pattern, because it was casted twice with `CastII` in this case. > The ideal graph looks like: > > Loop > \ > \ / ----------------------------- > \ / | > Phi | > | | > CastII | > | | > CastII | > | | > \ ConI | > \ | | > AddVI | > |-------------------------| > > > Relative code is > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667. > > > Befor... @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. > Yes, I also observed such regression. It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893017948