Robert Muir created LUCENE-10128:
------------------------------------
Summary: increased HNSW beam with causes large indexing perf
regression
Key: LUCENE-10128
URL: https://issues.apache.org/jira/browse/LUCENE-10128
Project: Lucene - Core
Issue Type: Task
Reporter: Robert Muir
Just opening a ticket in case there is anything we could/should do about it.
Looking at Mike's nightly benchmarks, I see a large (like 4x) drop in indexing
perf with vectors after LUCENE-10109.
There's some new stuff in the top CPU offenders:
{noformat}
PERCENT CPU SAMPLES STACK
19.93% 821395 org.apache.lucene.util.VectorUtil#dotProduct()
13.80% 568786 org.apache.lucene.util.LongHeap#downHeap()
11.06% 455711
org.apache.lucene.codecs.KnnVectorsWriter$VectorValuesMerger$MergerRandomAccess#vectorValue()
9.84% 405678 org.apache.lucene.util.LongHeap#upHeap()
6.72% 276931 java.util.concurrent.atomic.AtomicLong#get()
5.30% 218564 org.apache.lucene.util.LongHeap$2#lessThan()
2.69% 110872 java.util.Arrays#binarySearch0()
2.58% 106294 org.apache.lucene.util.hnsw.HnswGraph#search()
1.90% 78254 org.apache.lucene.util.LongHeap#push()
{noformat}
compared to before where the profile stacks looked like this:
{noformat}
PERCENT CPU SAMPLES STACK
13.58% 171575 org.apache.lucene.util.VectorUtil#dotProduct()
10.13% 127904 org.apache.lucene.util.LongHeap#downHeap()
9.84% 124257 org.apache.lucene.util.LongHeap#upHeap()
6.26% 79125 java.util.ArrayList#elementData()
4.34% 54831 java.util.Random#nextInt()
3.98% 50255 org.apache.lucene.util.BytesRefHash#equals()
3.69% 46594 org.apache.lucene.util.ByteBlockPool#allocSlice()
2.62% 33118 org.apache.lucene.util.BytesRefHash#findHash()
2.24% 28275
org.apache.lucene.analysis.standard.StandardTokenizerImpl#getNextToken()
2.14% 27033
org.apache.lucene.analysis.standard.StandardTokenizer#incrementToken()
{noformat}
At a glance, it seems to me that although some perf differences should be
expected, merging itself may have become more costly. Maybe there is some stuff
we can optimize about it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]