msokolov commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2248750458
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ########## @@ -137,9 +144,16 @@ public final class Lucene99HnswVectorsFormat extends KnnVectorsFormat { private final int numMergeWorkers; private final TaskExecutor mergeExec; + /** + * Whether to bypass HNSW graph building for tiny segments (below {@link #HNSW_GRAPH_THRESHOLD}). + * When enabled, segments with fewer than the threshold number of vectors will store only flat + * vectors, significantly improving indexing performance for workloads with frequent flushes. + */ + private final boolean bypassTinySegments; Review Comment: I do wonder if we would want to expose as a parameter though? Maybe it should just be a fixed value? I would have thought about setting it based on a threshold where exhaustive search is no-or-only-slightly more expensive than hnsw search? I would expect this to be related to the M of the graph maybe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org