mccullocht commented on code in PR #15736:
URL: https://github.com/apache/lucene/pull/15736#discussion_r2835060238
##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -610,11 +610,10 @@ private static int int4DotProductSinglePackedBody(
prod8a.convertShape(ZERO_EXTEND_B2S, Int4Constants.SHORT_SPECIES,
0);
acc1 = acc1.add(prod16a);
}
- Vector<Integer> intAcc0 = acc0.convert(S2I, 0);
- Vector<Integer> intAcc1 = acc0.convert(S2I, 1);
- Vector<Integer> intAcc2 = acc1.convert(S2I, 0);
- Vector<Integer> intAcc3 = acc1.convert(S2I, 1);
- sum +=
intAcc0.add(intAcc1).add(intAcc2).add(intAcc3).reinterpretAsInts().reduceLanes(ADD);
+ ShortVector accShort = acc0.add(acc1);
+ Vector<Integer> intAcc0 = accShort.convert(ZERO_EXTEND_S2I, 0);
+ Vector<Integer> intAcc1 = accShort.convert(ZERO_EXTEND_S2I, 1);
+ sum += intAcc0.add(intAcc1).reinterpretAsInts().reduceLanes(ADD);
}
Review Comment:
It's the hacker's delight popcount trick! You can generalize this down to 1
bit. I may look into this and see if it does better on x86 too. We `convert` a
lot to upcast for accumulation so there may be other places we can apply both
of these techniques.
When I profiled the call stack under `convert` was invoking a
`ShortShuffle128` class. aarch64 doesn't have any native short shuffle
instructions. This could be represented as a byte shuffle if you know what the
target endianness (clang will definitely generate this code in some cases).
`slice()` followed by `convert(..., 0)` might also do well but I struggle to
reason about performance with the vector incubator package in a way I don't
with raw intrinsics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]