On Mon, 18 Apr 2022 05:09:33 GMT, Jie Fu <ji...@openjdk.org> wrote: >>> However, just image that someone would like to optimize some code segments >>> of bytes/shorts `>>>` >> >> Then that person can just use signed shift (`VectorOperators.ASHR`), right? >> Shifting on masked shift counts means that the shift count cannot be greater >> than 8 for bytes and 16 for shorts, which means that `(byte)(src[i] >>> 3)` >> is exactly the same as `(byte)(src[i] >> 3)`. Please correct me if I >> misunderstood something here. >> >> Your proposed changes make unsigned shifts for subwords behave exactly the >> same as signed shifts, which is both redundant (we have 2 operations doing >> exactly the same thing) and inadequate (we lack the operation to do the >> proper unsigned shifts) >> >> Thank you very much. > >> > However, just image that someone would like to optimize some code segments >> > of bytes/shorts `>>>` >> >> Then that person can just use signed shift (`VectorOperators.ASHR`), right? >> Shifting on masked shift counts means that the shift count cannot be greater >> than 8 for bytes and 16 for shorts, which means that `(byte)(src[i] >>> 3)` >> is exactly the same as `(byte)(src[i] >> 3)`. Please correct me if I >> misunderstood something here. > > Yes, you're right that's why I said `LSHR` can be replaced with `ASHR`. > > However, not all the people are clever enough to do this source code level > replacement. > To be honest, I would never think of that `>>>` can be auto-vectorized by > this idea proposed by https://github.com/openjdk/jdk/pull/7979 . > >> >> Your proposed changes make unsigned shifts for subwords behave exactly the >> same as signed shifts, which is both redundant (we have 2 operations doing >> exactly the same thing) and inadequate (we lack the operation to do the >> proper unsigned shifts) > > `LSHR` following the behavior of scalar `>>>` is very important for Java > developers to rewrite the code with vector api. > Maybe, we should add one more operator to support what you called `the proper > unsigned shifts`, right? > But that's another topic which can be done in a separate issue.
> @DamonFool > > I think the issue is that these two cases of yours are not equal semantically. Why? According to the vector api doc, they should compute the same value when the shift_cnt is 3, right? > > ``` > 13 public static void urshift(byte[] src, byte[] dst) { > 14 for (int i = 0; i < src.length; i++) { > 15 dst[i] = (byte)(src[i] >>> 3); > 16 } > 17 } > 18 > 19 public static void urshiftVector(byte[] src, byte[] dst) { > 20 int i = 0; > 21 for (; i < spec.loopBound(src.length); i +=spec.length()) { > 22 var va = ByteVector.fromArray(spec, src, i); > 23 var vb = va.lanewise(VectorOperators.LSHR, 3); > 24 vb.intoArray(dst, i); > 25 } > 26 > 27 for (; i < src.length; i++) { > 28 dst[i] = (byte)(src[i] >>> 3); > 29 } > 30 } > ``` > > Besides the unsigned shift, line15 also has a type conversion which is > missing in the vector api case. To get the equivalent result, one need to > cast the result explicitly at line24, e.g, > `((IntVector)vb.castShape(SPECISE_XXX, 0)).intoArray(idst, i);` Since all the vector operations are already based on byte lane type, I don't think we need a `cast` operation here. Can we solve this problem by inserting a cast operation? ------------- PR: https://git.openjdk.java.net/jdk/pull/8276