On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu <ji...@openjdk.org> wrote:

> Hi all,
> 
> According to the Vector API doc, the `LSHR` operator computes 
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
> 
> The background is that one of our customers try to vectorize `urshift` with 
> `urshiftVector` like the following.
> 
>  13     public static void urshift(byte[] src, byte[] dst) {
>  14         for (int i = 0; i < src.length; i++) {
>  15             dst[i] = (byte)(src[i] >>> 3);
>  16         }
>  17     }
>  18 
>  19     public static void urshiftVector(byte[] src, byte[] dst) {
>  20         int i = 0;
>  21         for (; i < spec.loopBound(src.length); i +=spec.length()) {
>  22             var va = ByteVector.fromArray(spec, src, i);
>  23             var vb = va.lanewise(VectorOperators.LSHR, 3);
>  24             vb.intoArray(dst, i);
>  25         }
>  26 
>  27         for (; i < src.length; i++) {
>  28             dst[i] = (byte)(src[i] >>> 3);
>  29         }
>  30     }
> 
> 
> Unfortunately and to our surprise, code@line28 computes different results 
> with code@line23.
> It took quite a long time to figure out this bug.
> 
> The root cause is that current implemenation of Vector API can't compute the  
> unsigned right shift results as what is done for scalar `>>>` for negative 
> byte/short elements.
> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all 
> bytes, which is unable to compute the vectorized `>>>` for negative bytes.
> So this seems unreasonable and unfriendly to Java developers.
> It would be better to fix it.
> 
> The key idea to support unsigned right shift of negative bytes/shorts is just 
> to replace the unsigned right shift operation with the signed right shift 
> operation.
> This logic is:
> - For byte elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 24.
> - For short elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 16.
> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes 
> and shift_cnt <= 15 for shorts.
> 
> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
> And many thanks to @fg1417 .
> 
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935

Hi,

According to JLS section 5.8, any operand in a numeric arithmetic context 
undergoes a promotion to int, long, float or double and the operation is only 
defined for values of the promoted types. This means that `>>>` is not defined 
for byte/short values and the real behaviour here is that `src[i]` get promoted 
to int by a signed cast before entering the unsigned shift operation. This is 
different from `VectorOperators.LSHR` which is defined for byte/short element 
types. The scalar code does not do a byte unsigned shift but an int unsigned 
shift between a promotion and a narrowing cast, the explicit cast `(byte)` 
notifies the user of this behaviour.

Secondly, consistency is the key, having a byte unsigned shift promoted 
elements to int is not consistent, I can argue why aren't int elements being 
promoted to longs, or longs being promoted to the 128-bit integral type, too.

Finally, as I have mentioned in #7979, this usage of unsigned shift seems more 
likely to be a bug than an intended behaviour, so we should not bother to 
optimise this pattern.

Thank you very much.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8276

Reply via email to