On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> Ping again~  could any one please take a look at this PR? Thanks a lot!
>
>> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing 
>> and get back.
>> 
>> Do you have any idea about following regression?
>> 
>> ```
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 64    55844.814   48311.847  0.86
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 256   15139.459   13009.848  0.85
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 1024   3861.834    3284.944  0.85
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 4096    938.665     817.673  0.87
>> ```
>> 
>> Best Regards
> 
> Yes, I also observed such regression. After analyzing, I found it was caused 
> by the java side changes, which influences the range check elimination inside 
> `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The 
> root cause is the counted loop in following benchmark case is not recognized 
> by compiler as expected:
> 
> public void microByteGather256() {
>         for (int i = 0; i < SIZE; i += B256.length()) {
>             ByteVector.fromArray(B256, barr, 0, index, i)
>                             .intoArray(bres, i);
>         }
>  }
> ``` 
> The loop iv phi node is not recognized successfully when C2 recognize the 
> counted loop pattern, because it was casted twice with `CastII` in this case. 
> The ideal graph looks like:
> 
>                            Loop
>                                \
>                                     \        / -----------------------------
>                                        \  /                                 |
>                                        Phi                                  |
>                                          |                                  | 
>                                       CastII                                |
>                                           |                                 |
>                                       CastII                                |
>                                           |                                 |
>                                              \   ConI                       |
>                                                 \  |                        |
>                                                 AddVI                       |
>                                                   |-------------------------| 
>                           
> 
> Relative code is 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667.
>  
> 
> Befor...

@XiaohongGong Thanks for splitting this one out, and for investigating the 
regressions here.

Putting the permalink here, fixed to the current change (the link you pasted 
will always refer to the newest, which may later on point to the wrong line 
when lines above are inserted / deleted):
https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661

I wonder if we should just use `Node::uncast` there? But I'm quite unsure about 
that.

> Yes, I also observed such regression.
It would be nice if you proactively mentioned regressions, so it does not have 
to be pointed out by reviewers.

For me, it could be ok to fix it in a follow-up patch. I think we are too close 
to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, 
and then we have enough time in JDK26 to investigate the regression. Even 
better would be if we could do the other patch first, so we never even 
encounter a regression.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893017948

Reply via email to