mccullocht commented on PR #15257:
URL: https://github.com/apache/lucene/pull/15257#issuecomment-3357554058

   I did move just the vector dot product off-heap and I'm not planning to do 
anything clever with the corrections. I'm not sure that would pay off anyway -- 
you'd have to transpose from row view to column view to parallelize that work, 
and it would be 128-bit on x86 which may not go well.
   
   I was assuming that accessing the corrective terms was messing with 
performance but larger jfr stacks point at a more mysterious culprit. This PR 
spends more time in lane reduction (???) and 128-bit loads of data from the 
memory segment (probably memory latency). For the latter case its weird that I 
don't see anything on the baseline when I know that copying to the heap should 
be inducing a similar hit.
   ```
   baseline:
   36.66%        12745         
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
 [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [Inlined code]
   25.90%        9005          
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
 [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [JIT compiled 
code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [JIT compiled code]
   8.64%         3005          
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
                                 at 
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.util.VectorUtil#uint8DotProduct() [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer#quantizedScore()
 [Inlined code]
                                 at 
org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorScorer$1#score()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
   
   candidate:
   33.93%        11848         
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
                                 at 
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [Inlined code]
   23.97%        8369          
jdk.incubator.vector.IntVector#reduceLanesTemplate() [Inlined code]
                                 at 
jdk.incubator.vector.Int512Vector#reduceLanes() [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.RandomVectorScorer#bulkScore() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.AbstractHnswGraphSearcher#search() [JIT compiled 
code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#search() [JIT compiled code]
   13.33%        4655          
jdk.internal.misc.ScopedMemoryAccess#loadFromMemorySegmentScopedInternal() 
[Inlined code]
                                 at 
jdk.internal.misc.ScopedMemoryAccess#loadFromMemorySegment() [Inlined code]
                                 at 
jdk.incubator.vector.ByteVector#fromMemorySegment0Template() [Inlined code]
                                 at 
jdk.incubator.vector.Byte128Vector#fromMemorySegment0() [Inlined code]
                                 at 
jdk.incubator.vector.ByteVector#fromMemorySegment() [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport$MemorySegmentLoader#load()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody512()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#uint8DotProduct()
 [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.Lucene104MemorySegmentScalarQuantizedVectorScorer$RandomVectorScorerImpl#score()
 [JIT compiled code]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to