ChrisHegarty commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557577587

   @rmuir for the byte[] case, it seems to me that we want to size things so as 
to optimise for the ShortVector preferred species, right? which is what you 
seem to have done for a number of specific sizes - which I think it good. You 
did ask if we could generalise this.
   
   Based on the structure your latest commit, can we not just shape the stride 
of the ByteVector based on the preferred ShortVector, e.g.
   
   ```
   static final VectorSpecies<Byte> PREFERRED_BYTE_SPECIES = 
ByteVector.SPECIES_MAX.withShape(VectorShape.forBitSize(ShortVector.SPECIES_PREFERRED.vectorBitSize()
 >> 1));
   
     @Benchmark
     public int dotProductNewNew() {
       int i = 0;
       int res = 0;
       // only vectorize if we'll at least enter the loop a single time
       if (a.length >= ByteVector.SPECIES_64.length()) {
         // optimized 256 bit implementation, processes 8 bytes at a time
         int upperBound = PREFERRED_BYTE_SPECIES.loopBound(a.length);
         IntVector acc1 = IntVector.zero(IntVector.SPECIES_PREFERRED);
         IntVector acc2 = IntVector.zero(IntVector.SPECIES_PREFERRED);
         for (; i < upperBound; i += PREFERRED_BYTE_SPECIES.length()) {
             ByteVector va8 = ByteVector.fromArray(PREFERRED_BYTE_SPECIES, a, 
i);
             ByteVector vb8 = ByteVector.fromArray(PREFERRED_BYTE_SPECIES, b, 
i);
             Vector<Short> va16 = va8.convertShape(VectorOperators.B2S, 
ShortVector.SPECIES_PREFERRED, 0);
             Vector<Short> vb16 = vb8.convertShape(VectorOperators.B2S, 
ShortVector.SPECIES_PREFERRED, 0);
             Vector<Short> prod16 = va16.mul(vb16);
             Vector<Integer> prod32_1 = 
prod16.convertShape(VectorOperators.S2I, IntVector.SPECIES_PREFERRED, 0);
             Vector<Integer> prod32_2 = 
prod16.convertShape(VectorOperators.S2I, IntVector.SPECIES_PREFERRED, 1);
             acc1 = acc1.add(prod32_1);
             acc2 = acc2.add(prod32_2);
         }
         // reduce
         res += acc1.add(acc2).reduceLanes(VectorOperators.ADD);
       }
   
       for (; i < a.length; i++) {
         res += b[i] * a[i];
       }
       return res;
     }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to