On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> Ping again~ could any one please take a look at this PR? Thanks a lot! > >> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing >> and get back. >> >> Do you have any idea about following regression? >> >> ``` >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 64 55844.814 48311.847 0.86 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 256 15139.459 13009.848 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 1024 3861.834 3284.944 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms >> 4096 938.665 817.673 0.87 >> ``` >> >> Best Regards > > Yes, I also observed such regression. After analyzing, I found it was caused > by the java side changes, which influences the range check elimination inside > `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The > root cause is the counted loop in following benchmark case is not recognized > by compiler as expected: > > public void microByteGather256() { > for (int i = 0; i < SIZE; i += B256.length()) { > ByteVector.fromArray(B256, barr, 0, index, i) > .intoArray(bres, i); > } > } > ``` > The loop iv phi node is not recognized successfully when C2 recognize the > counted loop pattern, because it was casted twice with `CastII` in this case. > The ideal graph looks like: > > Loop > \ > \ / ----------------------------- > \ / | > Phi | > | | > CastII | > | | > CastII | > | | > \ ConI | > \ | | > AddVI | > |-------------------------| > > > Relative code is > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667. > > > Befor... > @XiaohongGong Thanks for splitting this one out, and for investigating the > regressions here. > > Putting the permalink here, fixed to the current change (the link you pasted > will always refer to the newest, which may later on point to the wrong line > when lines above are inserted / deleted): > > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 > > I wonder if we should just use `Node::uncast` there? But I'm quite unsure > about that. Sounds good to me. I will have a deep investigation for it. Thanks! > > Yes, I also observed such regression. > > It would be nice if you proactively mentioned regressions, so it does not > > have to be pointed out by reviewers. > > For me, it could be ok to fix it in a follow-up patch. I think we are too > close to RDP1 for JDK25 now anyway, and so we could push this patch here into > JDK26, and then we have enough time in JDK26 to investigate the regression. > Even better would be if we could do the other patch first, so we never even > encounter a regression. Sounds good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893026228