On Wed, 27 Apr 2022 09:13:34 GMT, Jie Fu <ji...@openjdk.org> wrote: >> Hi all, >> >> According to the Vector API doc, the `LSHR` operator computes >> `a>>>(n&(ESIZE*8-1))`. >> However, current implementation is incorrect for negative bytes/shorts. >> >> The background is that one of our customers try to vectorize `urshift` with >> `urshiftVector` like the following. >> >> 13 public static void urshift(byte[] src, byte[] dst) { >> 14 for (int i = 0; i < src.length; i++) { >> 15 dst[i] = (byte)(src[i] >>> 3); >> 16 } >> 17 } >> 18 >> 19 public static void urshiftVector(byte[] src, byte[] dst) { >> 20 int i = 0; >> 21 for (; i < spec.loopBound(src.length); i +=spec.length()) { >> 22 var va = ByteVector.fromArray(spec, src, i); >> 23 var vb = va.lanewise(VectorOperators.LSHR, 3); >> 24 vb.intoArray(dst, i); >> 25 } >> 26 >> 27 for (; i < src.length; i++) { >> 28 dst[i] = (byte)(src[i] >>> 3); >> 29 } >> 30 } >> >> >> Unfortunately and to our surprise, code@line28 computes different results >> with code@line23. >> It took quite a long time to figure out this bug. >> >> The root cause is that current implemenation of Vector API can't compute the >> unsigned right shift results as what is done for scalar `>>>` for negative >> byte/short elements. >> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for >> all bytes, which is unable to compute the vectorized `>>>` for negative >> bytes. >> So this seems unreasonable and unfriendly to Java developers. >> It would be better to fix it. >> >> The key idea to support unsigned right shift of negative bytes/shorts is >> just to replace the unsigned right shift operation with the signed right >> shift operation. >> This logic is: >> - For byte elements, unsigned right shift is equal to signed right shift if >> the shift_cnt <= 24. >> - For short elements, unsigned right shift is equal to signed right shift if >> the shift_cnt <= 16. >> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes >> and shift_cnt <= 15 for shorts. >> >> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 . >> And many thanks to @fg1417 . >> >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935 > >> > > According to the Vector API doc, the LSHR operator computes >> > > a>>>(n&(ESIZE*8-1)) >> >> Documentation is correct if viewed strictly in context of subword vector >> lane, JVM internally promotes/sign extends subword type scalar variables >> into int type, but vectors are loaded from continuous memory holding >> subwords, it will not be correct for developer to imagine that individual >> subword type lanes will be upcasted into int lanes before being operated >> upon. >> >> Thus both java implementation and compiler handling looks correct. > > Thanks @jatin-bhateja for taking a look at this. > After the discussion, I think it's fine to keep the current implementation of > LSHR. > So we're now fixing the misleading doc here: > https://github.com/openjdk/jdk/pull/8291 . > > And I think it would be better to add one more operator for `>>>`. > Thanks.
> @DamonFool should this PR and the JBS issue be closed? Okay. Let's close it. ------------- PR: https://git.openjdk.java.net/jdk/pull/8276