Pulkitg64 commented on PR #15549:
URL: https://github.com/apache/lucene/pull/15549#issuecomment-3770820114

   I think there is some misunderstanding. Since you mentioned about commented 
out code, I thought you were referring to different DefaultVectorUtil 
implementations which doesn't use Float16Vector class.
   
   This is the panama implementation which uses Float16Vectors  in 
`PanamaVectorUtilSupport` class which uses the JDK PR 
[change](https://bugs.openjdk.org/browse/JDK-8370691).
   
   
   ```
   @Override
     public short dotProduct(short[] a, short[] b) {
       int i = 0;
       short res = 0;
   
       // if the array size is large (> 2x platform vector size), it's worth 
the overhead to vectorize
       if (a.length > 2 * FLOAT16_SPECIES.length()) {
         i += FLOAT16_SPECIES.loopBound(a.length);
         res += dotProductBody(a, b, i);
       }
   
       // scalar tail
       for (; i < a.length; i++) {
         res = fma(a[i], b[i], res);
       }
       return res;
     }
   
     /** vectorized float dot product body */
     private short dotProductBody(short[] a, short[] b, int limit) {
       int i = 0;
       // vector loop is unrolled 4x (4 accumulators in parallel)
       // we don't know how many the cpu can do at once, some can do 2, some 4
       Float16Vector acc1 = Float16Vector.zero(FLOAT16_SPECIES);
       Float16Vector acc2 = Float16Vector.zero(FLOAT16_SPECIES);
       Float16Vector acc3 = Float16Vector.zero(FLOAT16_SPECIES);
       Float16Vector acc4 = Float16Vector.zero(FLOAT16_SPECIES);
       int unrolledLimit = limit - 3 * FLOAT16_SPECIES.length();
       for (; i < unrolledLimit; i += 4 * FLOAT16_SPECIES.length()) {
         // one
         Float16Vector va = Float16Vector.fromArray(FLOAT16_SPECIES, a, i);
         Float16Vector vb = Float16Vector.fromArray(FLOAT16_SPECIES, b, i);
         acc1 = fma(va, vb, acc1);
   
         // two
         Float16Vector vc = Float16Vector.fromArray(FLOAT16_SPECIES, a, i + 
FLOAT16_SPECIES.length());
         Float16Vector vd = Float16Vector.fromArray(FLOAT16_SPECIES, b, i + 
FLOAT16_SPECIES.length());
         acc2 = fma(vc, vd, acc2);
   
         // three
         Float16Vector ve = Float16Vector.fromArray(FLOAT16_SPECIES, a, i + 2 * 
FLOAT16_SPECIES.length());
         Float16Vector vf = Float16Vector.fromArray(FLOAT16_SPECIES, b, i + 2 * 
FLOAT16_SPECIES.length());
         acc3 = fma(ve, vf, acc3);
   
         // four
         Float16Vector vg = Float16Vector.fromArray(FLOAT16_SPECIES, a, i + 3 * 
FLOAT16_SPECIES.length());
         Float16Vector vh = Float16Vector.fromArray(FLOAT16_SPECIES, b, i + 3 * 
FLOAT16_SPECIES.length());
         acc4 = fma(vg, vh, acc4);
       }
       // vector tail: less scalar computations for unaligned sizes, esp with 
big vector sizes
       for (; i < limit; i += FLOAT16_SPECIES.length()) {
         Float16Vector va = Float16Vector.fromArray(FLOAT16_SPECIES, a, i);
         Float16Vector vb = Float16Vector.fromArray(FLOAT16_SPECIES, b, i);
         acc1 = fma(va, vb, acc1);
       }
       // reduce
       Float16Vector res1 = acc1.add(acc2);
       Float16Vector res2 = acc3.add(acc4);
       return res1.add(res2).reduceLanes(ADD);
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to