On Wed, 27 Apr 2022 09:13:34 GMT, Jie Fu <ji...@openjdk.org> wrote:

>> Hi all,
>> 
>> According to the Vector API doc, the `LSHR` operator computes 
>> `a>>>(n&(ESIZE*8-1))`.
>> However, current implementation is incorrect for negative bytes/shorts.
>> 
>> The background is that one of our customers try to vectorize `urshift` with 
>> `urshiftVector` like the following.
>> 
>>  13     public static void urshift(byte[] src, byte[] dst) {
>>  14         for (int i = 0; i < src.length; i++) {
>>  15             dst[i] = (byte)(src[i] >>> 3);
>>  16         }
>>  17     }
>>  18 
>>  19     public static void urshiftVector(byte[] src, byte[] dst) {
>>  20         int i = 0;
>>  21         for (; i < spec.loopBound(src.length); i +=spec.length()) {
>>  22             var va = ByteVector.fromArray(spec, src, i);
>>  23             var vb = va.lanewise(VectorOperators.LSHR, 3);
>>  24             vb.intoArray(dst, i);
>>  25         }
>>  26 
>>  27         for (; i < src.length; i++) {
>>  28             dst[i] = (byte)(src[i] >>> 3);
>>  29         }
>>  30     }
>> 
>> 
>> Unfortunately and to our surprise, code@line28 computes different results 
>> with code@line23.
>> It took quite a long time to figure out this bug.
>> 
>> The root cause is that current implemenation of Vector API can't compute the 
>>  unsigned right shift results as what is done for scalar `>>>` for negative 
>> byte/short elements.
>> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for 
>> all bytes, which is unable to compute the vectorized `>>>` for negative 
>> bytes.
>> So this seems unreasonable and unfriendly to Java developers.
>> It would be better to fix it.
>> 
>> The key idea to support unsigned right shift of negative bytes/shorts is 
>> just to replace the unsigned right shift operation with the signed right 
>> shift operation.
>> This logic is:
>> - For byte elements, unsigned right shift is equal to signed right shift if 
>> the shift_cnt <= 24.
>> - For short elements, unsigned right shift is equal to signed right shift if 
>> the shift_cnt <= 16.
>> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes 
>> and shift_cnt <= 15 for shorts.
>> 
>> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
>> And many thanks to @fg1417 .
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935
>
>> > > According to the Vector API doc, the LSHR operator computes 
>> > > a>>>(n&(ESIZE*8-1))
>> 
>> Documentation is correct if viewed strictly in context of subword vector 
>> lane, JVM internally promotes/sign extends subword type scalar variables 
>> into int type, but vectors are loaded from continuous memory holding 
>> subwords, it will not be correct for developer to imagine that individual 
>> subword type lanes will be upcasted into int lanes before being operated 
>> upon.
>> 
>> Thus both java implementation and compiler handling looks correct.
> 
> Thanks @jatin-bhateja for taking a look at this.
> After the discussion, I think it's fine to keep the current implementation of 
> LSHR.
> So we're now fixing the misleading doc here: 
> https://github.com/openjdk/jdk/pull/8291 .
> 
> And I think it would be better to add one more operator for `>>>`.
> Thanks.

@DamonFool should this PR and the JBS issue be closed?

-------------

PR: https://git.openjdk.java.net/jdk/pull/8276

Reply via email to