mccullocht commented on PR #15257:
URL: https://github.com/apache/lucene/pull/15257#issuecomment-3357554058
I did move just the vector dot product off-heap and I'm not planning to do
anything clever with the corrections. I'm not sure that would pay off anyway --
you'd have to transpose from row view to column view to parallelize that work,
and it would be 128-bit on x86 which may not go well.
I was assuming that accessing the corrective terms was messing with
performance but larger jfr stacks point at a more mysterious culprit. This PR
spends more time in lane reduction (???) and 128-bit loads of data from the
memory segment (probably memory latency). For the latter case its weird that I
don't see anything on the baseline when I know that copying to the heap should
be inducing a similar hit.
```
baseline:
36.66% 12745
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
[Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
[JIT compiled code]
at
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [Inlined code]
25.90% 9005
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
[Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
[JIT compiled code]
at
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [JIT compiled
code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [JIT compiled code]
8.64% 3005
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
at
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
[Inlined code]
at
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
[JIT compiled code]
at
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
candidate:
33.93% 11848
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
at
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
[JIT compiled code]
at
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [Inlined code]
23.97% 8369
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
at
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
[JIT compiled code]
at
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [JIT compiled
code]
at
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [JIT compiled code]
13.33% 4655
jdk.internal.misc.ScopedMemoryAccess#loadFromMemorySegmentScopedInternal()
[Inlined code]
at
jdk.internal.misc.ScopedMemoryAccess#loadFromMemorySegment() [Inlined code]
at
jdk.incubator.vector.ByteVector#fromMemorySegment0Template() [Inlined code]
at
jdk.incubator.vector.Byte128Vector#fromMemorySegment0() [Inlined code]
at
jdk.incubator.vector.ByteVector#fromMemorySegment() [Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport$MemorySegmentLoader#load()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
[Inlined code]
at
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
[Inlined code]
at
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
[JIT compiled code]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]