On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> Ping again~  could any one please take a look at this PR? Thanks a lot!
>
>> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing 
>> and get back.
>> 
>> Do you have any idea about following regression?
>> 
>> ```
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 64    55844.814   48311.847  0.86
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 256   15139.459   13009.848  0.85
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 1024   3861.834    3284.944  0.85
>> GatherOperationsBenchmark.microByteGather256              thrpt  30  ops/ms  
>> 4096    938.665     817.673  0.87
>> ```
>> 
>> Best Regards
> 
> Yes, I also observed such regression. After analyzing, I found it was caused 
> by the java side changes, which influences the range check elimination inside 
> `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The 
> root cause is the counted loop in following benchmark case is not recognized 
> by compiler as expected:
> 
> public void microByteGather256() {
>         for (int i = 0; i < SIZE; i += B256.length()) {
>             ByteVector.fromArray(B256, barr, 0, index, i)
>                             .intoArray(bres, i);
>         }
>  }
> ``` 
> The loop iv phi node is not recognized successfully when C2 recognize the 
> counted loop pattern, because it was casted twice with `CastII` in this case. 
> The ideal graph looks like:
> 
>                            Loop
>                                \
>                                     \        / -----------------------------
>                                        \  /                                 |
>                                        Phi                                  |
>                                          |                                  | 
>                                       CastII                                |
>                                           |                                 |
>                                       CastII                                |
>                                           |                                 |
>                                              \   ConI                       |
>                                                 \  |                        |
>                                                 AddVI                       |
>                                                   |-------------------------| 
>                           
> 
> Relative code is 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667.
>  
> 
> Befor...

> @XiaohongGong Thanks for splitting this one out, and for investigating the 
> regressions here.
> 
> Putting the permalink here, fixed to the current change (the link you pasted 
> will always refer to the newest, which may later on point to the wrong line 
> when lines above are inserted / deleted):
> 
> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661
> 
> I wonder if we should just use `Node::uncast` there? But I'm quite unsure 
> about that.

Sounds good to me. I will have a deep investigation for it. Thanks!



> > Yes, I also observed such regression.
> > It would be nice if you proactively mentioned regressions, so it does not 
> > have to be pointed out by reviewers.
> 
> For me, it could be ok to fix it in a follow-up patch. I think we are too 
> close to RDP1 for JDK25 now anyway, and so we could push this patch here into 
> JDK26, and then we have enough time in JDK26 to investigate the regression. 
> Even better would be if we could do the other patch first, so we never even 
> encounter a regression.

Sounds good to me. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893026228

Reply via email to