kaivalnp commented on code in PR #15742:
URL: https://github.com/apache/lucene/pull/15742#discussion_r2835272749
##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -1010,21 +1018,28 @@ private static int int4SquareDistanceSinglePackedBody(
// upper
ByteVector va8 = unpacked.load(Int4Constants.BYTE_SPECIES, i + j +
packed.length());
- ByteVector diff8 = vb8.and((byte) 0x0F).sub(va8);
- Vector<Short> diff16 = diff8.convertShape(B2S,
Int4Constants.SHORT_SPECIES, 0);
- acc0 = acc0.add(diff16.mul(diff16));
// lower
ByteVector vc8 = unpacked.load(Int4Constants.BYTE_SPECIES, i + j);
Review Comment:
Something I found interesting: we see a performance drop after a few warmup
iterations if we operate on `upper` (multiply with self and add to accumulator)
_before_ loading `lower`:
```
# Warmup Iteration 1: 12.736 ops/us
# Warmup Iteration 2: 16.919 ops/us
# Warmup Iteration 3: 4.171 ops/us
# Warmup Iteration 4: 4.174 ops/us
Iteration 1: 4.179 ops/us
Iteration 2: 4.188 ops/us
Iteration 3: 4.193 ops/us
Iteration 4: 4.201 ops/us
Iteration 5: 4.198 ops/us
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]