On Tue, 27 Apr 2021 17:56:04 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
> Current VectorAPI Java side implementation expresses rotateLeft and > rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) > res = lanewise(VectorOperations.OR, vec1 , vec2) > > This patch moves above handling from Java side to C2 compiler which > facilitates dismantling the rotate operation if target ISA does not support a > direct rotate instruction. > > AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over > long and integer type vectors. For other cases (i.e. sub-word type vectors or > for targets which do not support direct rotate operations ) instruction > sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. > > Please find below the performance data for included JMH benchmark. > Machine: Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz) > > > <html xmlns:v="urn:schemas-microsoft-com:vml" > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40"> > > <head> > > <meta name=ProgId content=Excel.Sheet> > <meta name=Generator content="Microsoft Excel 15"> > <link id=Main-File rel=Main-File > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > <link rel=File-List > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > <style> > > </style> > </head> > > <body link="#0563C1" vlink="#954F72"> > > > > Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts > (ops/ms) | Gain > -- | -- | -- | -- | -- | -- | -- > RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | > 0.973851372 > RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | > 0.966757399 > RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | > 1.030234907 > RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | > 0.986665464 > RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | > 0.967630525 > RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | > 0.984453766 > RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | > 0.978908964 > RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | > 0.988670669 > RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | > 0.986649996 > RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | > 0.992415694 > RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | > 0.975610495 > RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | > 0.986262605 > RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | 7777.105 | > 1.003719792 > RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | > 1.001646863 > RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | > 0.999849188 > RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | > 1.000843194 > RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | > 0.998776954 > RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | > 0.999041739 > RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | > 3.329338501 > RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | > 3.233829288 > RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | > 3.246098286 > RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | > 3.230315997 > RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | > 3.171940969 > RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | > 3.22095324 > RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | > 3.34665393 > RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | > 3.18429981 > RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | > 3.210244272 > RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | > 3.259019764 > RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | > 3.352069988 > RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | > 3.181869353 > RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | > 2.625252123 > RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | > 2.697217983 > RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | > 2.72596813 > RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | > 2.871118372 > RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | > 2.624018047 > RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | > 2.691895339 > RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | > 3.364761291 > RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 | > 3.229688895 > RotateBenchmark.testRotateLeftL | 128 | 15 | 256 | 1996.921 | 6985.256 | > 3.498013191 > RotateBenchmark.testRotateLeftL | 128 | 15 | 512 | 986.906 | 3217.778 | > 3.260470602 > RotateBenchmark.testRotateLeftL | 128 | 31 | 256 | 1999.06 | 6977.672 | > 3.490476524 > RotateBenchmark.testRotateLeftL | 128 | 31 | 512 | 987.258 | 3236.63 | > 3.278403416 > RotateBenchmark.testRotateLeftL | 256 | 7 | 256 | 3752.412 | 12995.954 | > 3.4633601 > RotateBenchmark.testRotateLeftL | 256 | 7 | 512 | 1824.093 | 5809.576 | > 3.184912173 > RotateBenchmark.testRotateLeftL | 256 | 15 | 256 | 3759.99 | 13262.631 | > 3.52730486 > RotateBenchmark.testRotateLeftL | 256 | 15 | 512 | 1823.393 | 5803.872 | > 3.183006626 > RotateBenchmark.testRotateLeftL | 256 | 31 | 256 | 3757.134 | 13284.633 | > 3.535842214 > RotateBenchmark.testRotateLeftL | 256 | 31 | 512 | 1822.192 | 5824.178 | > 3.196248255 > RotateBenchmark.testRotateLeftL | 512 | 7 | 256 | 5794.005 | 15567.753 | > 2.686872552 > RotateBenchmark.testRotateLeftL | 512 | 7 | 512 | 2969.393 | 7694.79 | > 2.591368 > RotateBenchmark.testRotateLeftL | 512 | 15 | 256 | 5817.292 | 15726.597 | > 2.703422314 > RotateBenchmark.testRotateLeftL | 512 | 15 | 512 | 2944.655 | 7664.954 | > 2.603005785 > RotateBenchmark.testRotateLeftL | 512 | 31 | 256 | 5822.131 | 16718.64 | > 2.871567129 > RotateBenchmark.testRotateLeftL | 512 | 31 | 512 | 2944.763 | 7657.814 | > 2.600485676 > RotateBenchmark.testRotateLeftS | 128 | 7 | 256 | 8006.155 | 7976.701 | > 0.99632108 > RotateBenchmark.testRotateLeftS | 128 | 7 | 512 | 4031.753 | 4003.43 | > 0.992975016 > RotateBenchmark.testRotateLeftS | 128 | 15 | 256 | 8003.879 | 7952.752 | > 0.993612222 > RotateBenchmark.testRotateLeftS | 128 | 15 | 512 | 4026.359 | 4014.757 | > 0.997118488 > RotateBenchmark.testRotateLeftS | 128 | 31 | 256 | 8000.842 | 7995.733 | > 0.999361442 > RotateBenchmark.testRotateLeftS | 128 | 31 | 512 | 4044.421 | 4007.426 | > 0.990852832 > RotateBenchmark.testRotateLeftS | 256 | 7 | 256 | 15078.471 | 15034.395 | > 0.997076892 > RotateBenchmark.testRotateLeftS | 256 | 7 | 512 | 7236.509 | 7620.334 | > 1.053040078 > RotateBenchmark.testRotateLeftS | 256 | 15 | 256 | 15093.661 | 15024.17 | > 0.995396014 > RotateBenchmark.testRotateLeftS | 256 | 15 | 512 | 7308.568 | 7724.381 | > 1.056893909 > RotateBenchmark.testRotateLeftS | 256 | 31 | 256 | 15332.233 | 15432.113 | > 1.006514381 > RotateBenchmark.testRotateLeftS | 256 | 31 | 512 | 7317.18 | 7626.679 | > 1.042297579 > RotateBenchmark.testRotateLeftS | 512 | 7 | 256 | 24079.012 | 23939.263 | > 0.994196232 > RotateBenchmark.testRotateLeftS | 512 | 7 | 512 | 11441.41 | 11921.21 | > 1.041935391 > RotateBenchmark.testRotateLeftS | 512 | 15 | 256 | 23563.675 | 23590.959 | > 1.001157884 > RotateBenchmark.testRotateLeftS | 512 | 15 | 512 | 11418.634 | 11949.391 | > 1.046481654 > RotateBenchmark.testRotateLeftS | 512 | 31 | 256 | 24035.69 | 23595.385 | > 0.9816812 > RotateBenchmark.testRotateLeftS | 512 | 31 | 512 | 11668.091 | 11899.536 | > 1.019835721 > RotateBenchmark.testRotateRightB | 128 | 7 | 256 | 3852.421 | 3816.521 | > 0.990681185 > RotateBenchmark.testRotateRightB | 128 | 7 | 512 | 1956.766 | 1923.638 | > 0.983070025 > RotateBenchmark.testRotateRightB | 128 | 15 | 256 | 3899.136 | 4038.945 | > 1.035856405 > RotateBenchmark.testRotateRightB | 128 | 15 | 512 | 1957.733 | 2030.973 | > 1.037410617 > RotateBenchmark.testRotateRightB | 128 | 31 | 256 | 3902.5 | 4043.736 | > 1.03619116 > RotateBenchmark.testRotateRightB | 128 | 31 | 512 | 1957.728 | 1920.434 | > 0.980950367 > RotateBenchmark.testRotateRightB | 256 | 7 | 256 | 4565.887 | 4515.083 | > 0.988873137 > RotateBenchmark.testRotateRightB | 256 | 7 | 512 | 2300.057 | 2278.065 | > 0.990438498 > RotateBenchmark.testRotateRightB | 256 | 15 | 256 | 4570.754 | 4527.692 | > 0.990578797 > RotateBenchmark.testRotateRightB | 256 | 15 | 512 | 2300.524 | 2268.659 | > 0.986148808 > RotateBenchmark.testRotateRightB | 256 | 31 | 256 | 4577.569 | 4513.29 | > 0.98595783 > RotateBenchmark.testRotateRightB | 256 | 31 | 512 | 2304.335 | 2273.178 | > 0.986478962 > RotateBenchmark.testRotateRightB | 512 | 7 | 256 | 7772.483 | 7842.671 | > 1.009030319 > RotateBenchmark.testRotateRightB | 512 | 7 | 512 | 3907.265 | 3917.325 | > 1.002574691 > RotateBenchmark.testRotateRightB | 512 | 15 | 256 | 7855.653 | 7865.25 | > 1.001221668 > RotateBenchmark.testRotateRightB | 512 | 15 | 512 | 3909.845 | 3976.813 | > 1.017128045 > RotateBenchmark.testRotateRightB | 512 | 31 | 256 | 7746.765 | 7870.159 | > 1.015928455 > RotateBenchmark.testRotateRightB | 512 | 31 | 512 | 3919.596 | 3981.934 | > 1.01590419 > RotateBenchmark.testRotateRightI | 128 | 7 | 256 | 4125.151 | 13056.878 | > 3.165187893 > RotateBenchmark.testRotateRightI | 128 | 7 | 512 | 2045.201 | 6501.447 | > 3.17887924 > RotateBenchmark.testRotateRightI | 128 | 15 | 256 | 4111.736 | 13318.124 | > 3.23905134 > RotateBenchmark.testRotateRightI | 128 | 15 | 512 | 2055.355 | 6497.289 | > 3.161151723 > RotateBenchmark.testRotateRightI | 128 | 31 | 256 | 4109.353 | 13073.3 | > 3.181352393 > RotateBenchmark.testRotateRightI | 128 | 31 | 512 | 2055.431 | 6463.902 | > 3.14479153 > RotateBenchmark.testRotateRightI | 256 | 7 | 256 | 7804.976 | 24585.962 | > 3.150036848 > RotateBenchmark.testRotateRightI | 256 | 7 | 512 | 3815.818 | 11985.145 | > 3.140911071 > RotateBenchmark.testRotateRightI | 256 | 15 | 256 | 7644.977 | 25863.841 | > 3.383115606 > RotateBenchmark.testRotateRightI | 256 | 15 | 512 | 3822.508 | 12280.58 | > 3.212702236 > RotateBenchmark.testRotateRightI | 256 | 31 | 256 | 7709.635 | 25655.108 | > 3.327668301 > RotateBenchmark.testRotateRightI | 256 | 31 | 512 | 3801.5 | 12271.65 | > 3.228107326 > RotateBenchmark.testRotateRightI | 512 | 7 | 256 | 12223.711 | 31239.788 | > 2.555671351 > RotateBenchmark.testRotateRightI | 512 | 7 | 512 | 5973.571 | 16740.852 | > 2.802486486 > RotateBenchmark.testRotateRightI | 512 | 15 | 256 | 12205.47 | 31248.025 | > 2.560165647 > RotateBenchmark.testRotateRightI | 512 | 15 | 512 | 5966.513 | 15728.168 | > 2.6360737 > RotateBenchmark.testRotateRightI | 512 | 31 | 256 | 12209.405 | 33181.105 | > 2.71766765 > RotateBenchmark.testRotateRightI | 512 | 31 | 512 | 5981.527 | 15727.496 | > 2.629344647 > RotateBenchmark.testRotateRightL | 128 | 7 | 256 | 2054.509 | 6980.849 | > 3.397818652 > RotateBenchmark.testRotateRightL | 128 | 7 | 512 | 997.375 | 3242.374 | > 3.250907633 > RotateBenchmark.testRotateRightL | 128 | 15 | 256 | 2051.459 | 6892.389 | > 3.359749817 > RotateBenchmark.testRotateRightL | 128 | 15 | 512 | 1002.906 | 3223.342 | > 3.21400211 > RotateBenchmark.testRotateRightL | 128 | 31 | 256 | 2044.749 | 6984.157 | > 3.415654929 > RotateBenchmark.testRotateRightL | 128 | 31 | 512 | 1004.273 | 3237.496 | > 3.22372104 > RotateBenchmark.testRotateRightL | 256 | 7 | 256 | 3811.551 | 13347.75 | > 3.501920872 > RotateBenchmark.testRotateRightL | 256 | 7 | 512 | 1892.883 | 5840.85 | > 3.085689924 > RotateBenchmark.testRotateRightL | 256 | 15 | 256 | 3821.705 | 14034.823 | > 3.672398314 > RotateBenchmark.testRotateRightL | 256 | 15 | 512 | 1799.193 | 5817.533 | > 3.233412424 > RotateBenchmark.testRotateRightL | 256 | 31 | 256 | 3816.666 | 14022.31 | > 3.673968327 > RotateBenchmark.testRotateRightL | 256 | 31 | 512 | 1796.649 | 5824.13 | > 3.241662673 > RotateBenchmark.testRotateRightL | 512 | 7 | 256 | 5943.986 | 15586.254 | > 2.622188881 > RotateBenchmark.testRotateRightL | 512 | 7 | 512 | 3022.686 | 7662.241 | > 2.534911334 > RotateBenchmark.testRotateRightL | 512 | 15 | 256 | 5958.008 | 15726.859 | > 2.639616966 > RotateBenchmark.testRotateRightL | 512 | 15 | 512 | 2998.469 | 7654.703 | > 2.552870482 > RotateBenchmark.testRotateRightL | 512 | 31 | 256 | 5937.491 | 15741.207 | > 2.651154671 > RotateBenchmark.testRotateRightL | 512 | 31 | 512 | 3014.699 | 7656.837 | > 2.539834657 > RotateBenchmark.testRotateRightS | 128 | 7 | 256 | 8172.896 | 8003.474 | > 0.979270261 > RotateBenchmark.testRotateRightS | 128 | 7 | 512 | 4111.074 | 4047.267 | > 0.984479238 > RotateBenchmark.testRotateRightS | 128 | 15 | 256 | 8225.79 | 8040.421 | > 0.9774649 > RotateBenchmark.testRotateRightS | 128 | 15 | 512 | 4129.801 | 4011.919 | > 0.971455767 > RotateBenchmark.testRotateRightS | 128 | 31 | 256 | 8176.102 | 8052.686 | > 0.984905276 > RotateBenchmark.testRotateRightS | 128 | 31 | 512 | 4117.735 | 4046.522 | > 0.982705784 > RotateBenchmark.testRotateRightS | 256 | 7 | 256 | 15213.617 | 15169.51 | > 0.997100821 > RotateBenchmark.testRotateRightS | 256 | 7 | 512 | 7530.289 | 7625.581 | > 1.012654494 > RotateBenchmark.testRotateRightS | 256 | 15 | 256 | 15238.384 | 15069.978 | > 0.988948566 > RotateBenchmark.testRotateRightS | 256 | 15 | 512 | 7275.098 | 7620.764 | > 1.047513587 > RotateBenchmark.testRotateRightS | 256 | 31 | 256 | 15299.821 | 15043.765 | > 0.983264118 > RotateBenchmark.testRotateRightS | 256 | 31 | 512 | 7273.028 | 7630.97 | > 1.04921499 > RotateBenchmark.testRotateRightS | 512 | 7 | 256 | 23998.152 | 23920.046 | > 0.996745333 > RotateBenchmark.testRotateRightS | 512 | 7 | 512 | 11582.679 | 11916.382 | > 1.02881052 > RotateBenchmark.testRotateRightS | 512 | 15 | 256 | 23982.797 | 23434.756 | > 0.977148579 > RotateBenchmark.testRotateRightS | 512 | 15 | 512 | 11629.806 | 11918.759 | > 1.0248459 > RotateBenchmark.testRotateRightS | 512 | 31 | 256 | 23988.549 | 23475.629 | > 0.978618132 > RotateBenchmark.testRotateRightS | 512 | 31 | 512 | 11650.146 | 11916.47 | > 1.022860143 > > > > </body> > > </html> This pull request has now been integrated. Changeset: d994b93e Author: Jatin Bhateja <jbhat...@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/d994b93e211d49af79212d765633ba3457365a08 Stats: 4438 lines in 57 files changed: 4219 ins; 58 del; 161 mod 8266054: VectorAPI rotate operation optimization Reviewed-by: psandoz, sviswanathan ------------- PR: https://git.openjdk.java.net/jdk/pull/3720