On Mon, 15 Sep 2025 09:33:47 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   Add an IR rule for vector mask cast operation
>
> Your benchmark and code changes look good to me. Thanks for addressing my 
> comments.

Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, 
no much changes.

On Nvidia Grace machine with 128-bit SVE2:
With option `-XX:UseSVE=2`:

Benchmark               COMPARISON_OP   Unit    Before          Score Error     
After           Score Error     Uplift
testCompareMaskNotDouble        EQ      ops/s   908008.7644     827.699314      
1175289.515     240.548861      1.294359
testCompareMaskNotDouble        NE      ops/s   872199.2489     131.090115      
1175667.777     129.741515      1.347934
testCompareMaskNotDouble        LT      ops/s   880166.7559     1570.41653      
882160.6889     4723.507639     1.002265
testCompareMaskNotDouble        LE      ops/s   878115.3293     2919.637497     
879033.7895     5404.617017     1.001045
testCompareMaskNotDouble        GT      ops/s   877068.5325     9595.275981     
865832.864      5054.26002      0.987189
testCompareMaskNotDouble        GE      ops/s   895695.0228     3276.687933     
871153.7117     7714.572967     0.9726
testCompareMaskNotFloat     EQ  ops/s   1811841.295     278.140948      
2350971.83      606.667654      1.297559
testCompareMaskNotFloat     NE  ops/s   1727124.634     1755.717051     
2351789.019     269.531198      1.361678
testCompareMaskNotFloat     LT  ops/s   1735243.319     4912.343726     
1726257.01      823.746765      0.994821
testCompareMaskNotFloat     LE  ops/s   1726151.367     1071.383328     
1727029.339     960.336314      1.000508
testCompareMaskNotFloat     GT  ops/s   1729704.897     1646.026351     
1726069.02      440.981281      0.997897
testCompareMaskNotFloat     GE  ops/s   1726515.227     2171.61643      
1728365.682     1404.298156     1.001071
testCompareMaskNotByte      EQ  ops/s   8480574.694     1254.415788     
10200329.86     8560.199493     1.202787
testCompareMaskNotByte      NE  ops/s   8480141.263     1437.762594     
10207424.91     3664.106923     1.203685
testCompareMaskNotByte      LT  ops/s   8471471.384     7699.585554     
10203300.19     4675.047416     1.20443
testCompareMaskNotByte      LE  ops/s   8476165.519     6045.944392     
10204956.23     2174.866199     1.203959
testCompareMaskNotByte      GT  ops/s   8479397.377     1290.560961     
10207032.3      5414.789178     1.203745
testCompareMaskNotByte      GE  ops/s   8479979.908     1094.823175     
10203115.77     2909.433184     1.2032
testCompareMaskNotByte      ULT ops/s   8480915.515     1420.30856      
10213140.54     19628.56888     1.204249
testCompareMaskNotByte      ULE ops/s   8481768.961     1806.086454     
10191601.05     9537.089409     1.201589
testCompareMaskNotByte      UGT ops/s   8477948.807     3652.437106     
10208439.79     8335.226416     1.204116
testCompareMaskNotByte      UGE ops/s   8477320.065     2191.753237     
10198589.9      5748.761942     1.203044
testCompareMaskNotInt       EQ  ops/s   1906386.393     208.045573      
2346741.129     383.461819      1.230989
testCompareMaskNotInt       NE  ops/s   1674206.146     169.967081      
2346609.602     652.964692      1.401625
testCompareMaskNotInt       LT  ops/s   1684755.085     4939.806653     
2345939.728     738.842445      1.392451
testCompareMaskNotInt       LE  ops/s   1659985.83      2408.542766     
2346929.8       192.550397      1.413825
testCompareMaskNotInt       GT  ops/s   1674460.437     447.120589      
2347037.155     342.433085      1.401667
testCompareMaskNotInt       GE  ops/s   1658699.073     884.268891      
2347411.827     281.885914      1.415212
testCompareMaskNotInt       ULT ops/s   1677043.66      6215.834359     
2347155.384     425.141786      1.399579
testCompareMaskNotInt       ULE ops/s   1667049.76      9521.094204     
2346815.213     316.03901       1.407765
testCompareMaskNotInt       UGT ops/s   1661045.828     3669.548525     
2346711.365     2808.608132     1.412791
testCompareMaskNotInt       UGE ops/s   1663715.691     4570.73053      
2347096.847     191.804359      1.410755
testCompareMaskNotLong      EQ  ops/s   885668.5947     203.053456      
1174274.006     113.51354       1.325861
testCompareMaskNotLong      NE  ops/s   837449.9353     198.611966      
1174330.269     106.514374      1.402269
testCompareMaskNotLong      LT  ops/s   846790.2128     7005.585657     
1174290.879     93.56413        1.386755
testCompareMaskNotLong      LE  ops/s   851253.2346     7624.045467     
1174162.355     179.854316      1.379333
testCompareMaskNotLong      GT  ops/s   837715.7563     4272.558281     
1173797.819     289.311518      1.401188
testCompareMaskNotLong      GE  ops/s   883137.593      14804.63746     
1174216.909     86.404559       1.329596
testCompareMaskNotLong      ULT ops/s   872478.9017     4955.722542     
1174341.995     124.656933      1.345983
testCompareMaskNotLong      ULE ops/s   866570.738      12541.58528     
1174185.197     594.850706      1.354979
testCompareMaskNotLong      UGT ops/s   866389.0927     3971.492766     
1174210.803     153.960084      1.355292
testCompareMaskNotLong      UGE ops/s   848339.3876     4555.514721     
1174060.638     240.326562      1.383951
testCompareMaskNotShort     EQ  ops/s   3336170.783     2286.717236     
4684904.156     2134.72575      1.404275
testCompareMaskNotShort     NE  ops/s   3334775.472     717.588615      
4690264.12      3017.756867     1.40647
testCompareMaskNotShort     LT  ops/s   3334619.058     1138.901707     
4685883.864     3808.321694     1.405223
testCompareMaskNotShort     LE  ops/s   3335538.353     538.676789      
4688238.934     1029.406266     1.405541
testCompareMaskNotShort     GT  ops/s   3301425.217     694.060525      
4689167.049     2845.363801     1.420346
testCompareMaskNotShort     GE  ops/s   3301580.972     317.042851      
4688970.211     1292.83929      1.420219
testCompareMaskNotShort     ULT ops/s   3336318.051     892.515034      
4687549.384     1403.281648     1.405006
testCompareMaskNotShort     ULE ops/s   3335188.292     972.230191      
4684723.63      3937.599084     1.404635
testCompareMaskNotShort     UGT ops/s   3334490.656     930.409628      
4688058.378     1166.776081     1.405929
testCompareMaskNotShort     UGE ops/s   3333050.033     3146.019596     
4689197.9       456.439188      1.406878


With option `-XX:UseSVE=0`:

Benchmark               COMPARISON_OP   Unit    Before          Score Error     
After           Score Error     Uplift
testCompareMaskNotDouble        EQ      ops/s   788505.9464     579.254839      
769969.5798     138.792325      0.976491
testCompareMaskNotDouble        NE      ops/s   655499.7935     471.970429      
915086.3257     183.495964      1.396013
testCompareMaskNotDouble        LT      ops/s   788418.7889     574.263314      
789271.7448     51.838991       1.001081
testCompareMaskNotDouble        LE      ops/s   789144.8431     45.334181       
789326.1963     84.148011       1.000229
testCompareMaskNotDouble        GT      ops/s   788690.8485     662.950083      
789246.9812     99.060588       1.000705
testCompareMaskNotDouble        GE      ops/s   789421.2387     94.012868       
789166.4717     111.772533      0.999677
testCompareMaskNotFloat     EQ  ops/s   1816132.864     1298.2187       
1816461.601     311.706275      1.000181
testCompareMaskNotFloat     NE  ops/s   1550767.697     1142.987761     
2301429.148     159.71525       1.484057
testCompareMaskNotFloat     LT  ops/s   1815531.685     1370.868745     
1817187.121     761.68401       1.000911
testCompareMaskNotFloat     LE  ops/s   1817937.722     484.638134      
1817703.209     625.275639      0.999871
testCompareMaskNotFloat     GT  ops/s   1818618.89      724.324392      
1817977.851     481.152488      0.999647
testCompareMaskNotFloat     GE  ops/s   1815118.411     1327.945736     
1817476.414     510.712942      1.001299
testCompareMaskNotByte      EQ  ops/s   6489599.571     5127.815254     
6535895.286     17029.15534     1.007133
testCompareMaskNotByte      NE  ops/s   9089974.523     4069.346579     
15945662.17     22867.48282     1.754203
testCompareMaskNotByte      LT  ops/s   6499040.898     1250.085336     
15939338.57     17451.05939     2.452567
testCompareMaskNotByte      LE  ops/s   6493612.339     4928.466061     
15926355.01     27249.57103     2.452618
testCompareMaskNotByte      GT  ops/s   6494486.565     5229.4598       
15957497.14     6893.237334     2.457083
testCompareMaskNotByte      GE  ops/s   6499295.661     1030.044749     
15903755.01     46454.70992     2.446996
testCompareMaskNotByte      ULT ops/s   6494212.684     5194.712704     
15944816.71     3467.818892     2.455234
testCompareMaskNotByte      ULE ops/s   6493882.576     5092.839387     
15936419.25     22755.34523     2.454066
testCompareMaskNotByte      UGT ops/s   6493479.899     4678.096391     
15958133.18     3483.353667     2.457562
testCompareMaskNotByte      UGE ops/s   6500338.419     709.344957      
15968155.27     14020.47085     2.456511
testCompareMaskNotInt       EQ  ops/s   1830787.273     237.597163      
1878452.588     142.728192      1.026035
testCompareMaskNotInt       NE  ops/s   1615081.395     1219.871461     
2360913.712     199.556675      1.461792
testCompareMaskNotInt       LT  ops/s   1827819.867     1360.728526     
2360561.422     248.025925      1.291462
testCompareMaskNotInt       LE  ops/s   1830975.648     416.987529      
2360703.924     194.958346      1.289314
testCompareMaskNotInt       GT  ops/s   1830633.964     301.849017      
2360552.203     224.908655      1.289472
testCompareMaskNotInt       GE  ops/s   1829476.495     1348.361278     
2360673.736     137.538696      1.290354
testCompareMaskNotInt       ULT ops/s   1829137.773     1285.55232      
2360615.95      162.876291      1.290562
testCompareMaskNotInt       ULE ops/s   1828107.468     1360.867847     
2360790.337     297.267481      1.291384
testCompareMaskNotInt       UGT ops/s   1829659.222     1459.098806     
2361025.107     266.158075      1.290417
testCompareMaskNotInt       UGE ops/s   1829548.187     1427.266787     
2360941.943     242.380469      1.29045
testCompareMaskNotLong      EQ  ops/s   810439.9121     82.577412       
802287.4993     73.462086       0.98994
testCompareMaskNotLong      NE  ops/s   681643.6089     485.657471      
932324.6973     158.28799       1.367759
testCompareMaskNotLong      LT  ops/s   809850.546      680.71673       
931404.3219     685.591444      1.150094
testCompareMaskNotLong      LE  ops/s   810584.5191     115.234753      
932234.2412     105.451172      1.150076
testCompareMaskNotLong      GT  ops/s   810593.5376     117.947863      
931879.1829     553.397713      1.149625
testCompareMaskNotLong      GE  ops/s   810435.8405     81.88737        
931833.0348     177.765694      1.149792
testCompareMaskNotLong      ULT ops/s   810429.8459     90.005329       
932127.5278     74.443387       1.150164
testCompareMaskNotLong      ULE ops/s   809740.842      411.655134      
932231.6607     76.044104       1.151271
testCompareMaskNotLong      UGT ops/s   810493.4369     52.024062       
932239.1709     143.915229      1.150211
testCompareMaskNotLong      UGE ops/s   810442.0661     64.064396       
932361.567      119.570287      1.150435
testCompareMaskNotShort     EQ  ops/s   4786426.182     299.050738      
4694123.013     482.608634      0.980715
testCompareMaskNotShort     NE  ops/s   3808932.807     2993.590606     
5672255.469     6262.526335     1.489198
testCompareMaskNotShort     LT  ops/s   4782535.485     3699.104322     
5668474.071     11101.86452     1.185244
testCompareMaskNotShort     LE  ops/s   4782896.891     3338.57484      
5669188.434     6309.723399     1.185304
testCompareMaskNotShort     GT  ops/s   4778532.318     3571.547653     
5680482.703     10427.66734     1.18875
testCompareMaskNotShort     GE  ops/s   4786150.851     794.769881      
5664644.919     6542.434538     1.183549
testCompareMaskNotShort     ULT ops/s   4783623.78      3582.962421     
5668267.123     17841.44773     1.184931
testCompareMaskNotShort     ULE ops/s   4782752.125     3610.296618     
5666231.302     6964.505363     1.184721
testCompareMaskNotShort     UGT ops/s   4782469.332     2913.37576      
5655837.96      6494.608864     1.182618
testCompareMaskNotShort     UGE ops/s   4782606.35      3491.774067     
5667295.182     14176.96543     1.18498


On AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:

Benchmark               COMPARISON_OP   Unit    Before          Score Error     
After           Score Error     Uplift
testCompareMaskNotDouble        EQ      ops/s   2166357.886     27577.51358     
2920183.192     38491.49083     1.347968
testCompareMaskNotDouble        NE      ops/s   2177325.341     32771.27023     
2965747.932     39271.62615     1.362106
testCompareMaskNotDouble        LT      ops/s   2123834.711     22890.39919     
2197099.169     29107.41329     1.034496
testCompareMaskNotDouble        LE      ops/s   2172931.681     32912.05647     
2121686.057     34927.37781     0.976416
testCompareMaskNotDouble        GT      ops/s   2164924.662     30925.91899     
2124062.892     37135.0458      0.981125
testCompareMaskNotDouble        GE      ops/s   2150619.038     35515.09022     
2192636.533     38672.85716     1.019537
testCompareMaskNotFloat     EQ  ops/s   4518378.764     74733.72389     
6724589.409     50424.63568     1.488274
testCompareMaskNotFloat     NE  ops/s   4522823.224     78138.66727     
6907565.257     203953.3299     1.527268
testCompareMaskNotFloat     LT  ops/s   4587473.545     62621.25938     
4431658.918     52760.23989     0.966034
testCompareMaskNotFloat     LE  ops/s   4472078.986     79338.23304     
4472390.043     66247.285       1.000069
testCompareMaskNotFloat     GT  ops/s   4451744.39      220787.9755     
4440866.486     58674.19154     0.997556
testCompareMaskNotFloat     GE  ops/s   4459601.349     57873.05167     
4481398.426     76819.69285     1.004887
testCompareMaskNotByte      EQ  ops/s   19415317.92     356367.4937     
20649319.86     240515.9459     1.063558
testCompareMaskNotByte      NE  ops/s   19401162.58     362571.8103     
21010358.2      71221.35255     1.082943
testCompareMaskNotByte      LT  ops/s   19175612.37     273080.6175     
20235838.72     396190.6101     1.05529
testCompareMaskNotByte      LE  ops/s   19036831.33     121135.0491     
20674528.84     248839.9471     1.086027
testCompareMaskNotByte      GT  ops/s   19008302.3      124633.9182     
20671390.89     271644.5576     1.087492
testCompareMaskNotByte      GE  ops/s   19590753.42     429156.452      
20491615.07     332912.82       1.045984
testCompareMaskNotByte      ULT ops/s   19431604.06     421396.5487     
20575805.9      248466.2368     1.058883
testCompareMaskNotByte      ULE ops/s   19060425.47     98309.75469     
20774930.43     206596.0422     1.089951
testCompareMaskNotByte      UGT ops/s   19266788.04     362893.3051     
20861521.87     106977.3707     1.082771
testCompareMaskNotByte      UGE ops/s   19127964.33     447774.3747     
20791221.56     254458.0132     1.086954
testCompareMaskNotInt       EQ  ops/s   4473402.48      84902.77154     
7191777.028     94315.13878     1.607674
testCompareMaskNotInt       NE  ops/s   4583165.363     73491.79073     
7249884.988     80028.31191     1.581851
testCompareMaskNotInt       LT  ops/s   4618634.192     81869.82512     
7242567.732     71211.3697      1.568118
testCompareMaskNotInt       LE  ops/s   4650524.195     72302.56692     
7154948.491     83057.90635     1.538525
testCompareMaskNotInt       GT  ops/s   4534752.486     94449.20198     
7004428.251     38365.18576     1.54461
testCompareMaskNotInt       GE  ops/s   4540777.389     86331.11847     
7129527.341     74343.06996     1.570111
testCompareMaskNotInt       ULT ops/s   4528175.644     114213.6504     
7220013.98      82850.22587     1.594464
testCompareMaskNotInt       ULE ops/s   4619335.448     74203.98889     
7118543.128     54457.43284     1.541031
testCompareMaskNotInt       UGT ops/s   4572521.254     122912.75       
7154797.741     98858.3477      1.564737
testCompareMaskNotInt       UGE ops/s   4579627.842     80558.04554     
7179020.593     99239.23499     1.567599
testCompareMaskNotLong      EQ  ops/s   2103965.347     17059.28178     
2997338.009     32388.42725     1.424613
testCompareMaskNotLong      NE  ops/s   2174434.633     36011.24708     
2984460.593     29074.42994     1.372522
testCompareMaskNotLong      LT  ops/s   2110937.378     56642.0052      
3020690.893     31167.62537     1.430971
testCompareMaskNotLong      LE  ops/s   2153414.166     31280.20562     
2971696.162     31176.24605     1.379992
testCompareMaskNotLong      GT  ops/s   2166028.207     49432.18925     
3008018.282     26534.78551     1.388725
testCompareMaskNotLong      GE  ops/s   2178206.136     35757.6799      
2933186.687     19824.26727     1.346606
testCompareMaskNotLong      ULT ops/s   2104344.728     31405.7728      
2964354.007     26871.18289     1.408682
testCompareMaskNotLong      ULE ops/s   2210232.578     21993.95777     
3032635.261     25545.43656     1.372088
testCompareMaskNotLong      UGT ops/s   2167177.931     44896.90807     
2996245.236     34153.68941     1.382556
testCompareMaskNotLong      UGE ops/s   2117175.328     26131.1893      
2977492.164     23227.65519     1.406351
testCompareMaskNotShort     EQ  ops/s   8131234.179     185997.1777     
12414378.38     122648.1579     1.526752
testCompareMaskNotShort     NE  ops/s   8506016.656     236481.383      
12720442.64     322747.8776     1.495464
testCompareMaskNotShort     LT  ops/s   8487868.819     244943.6097     
12150479.62     244300.5456     1.431511
testCompareMaskNotShort     LE  ops/s   8549184.557     286833.466      
12358019.06     136683.2112     1.44552
testCompareMaskNotShort     GT  ops/s   8375447.45      221237.073      
12602058.97     385690.3318     1.504643
testCompareMaskNotShort     GE  ops/s   8123474.548     127727.1461     
12799747.64     197940.1001     1.575649
testCompareMaskNotShort     ULT ops/s   8491650.422     313124.2425     
12751186.59     255845.1653     1.501614
testCompareMaskNotShort     ULE ops/s   8363009.676     203670.1995     
12675908.7      279496.9925     1.515711
testCompareMaskNotShort     UGT ops/s   8332268.933     279787.2503     
12279451.4      436971.6582     1.473722
testCompareMaskNotShort     UGE ops/s   8931588.505     203962.9257     
12324437.67     330723.3066     1.37987

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3291304777

Reply via email to