jpountz commented on code in PR #12160: URL: https://github.com/apache/lucene/pull/12160#discussion_r1120720933
########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -73,17 +77,48 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { .build(); Query rewritten = indexSearcher.rewrite(booleanQuery); filterWeight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1f); + } else { + filterWeight = null; } - for (LeafReaderContext ctx : reader.leaves()) { - TopDocs results = searchLeaf(ctx, filterWeight); - if (ctx.docBase > 0) { - for (ScoreDoc scoreDoc : results.scoreDocs) { - scoreDoc.doc += ctx.docBase; - } - } - perLeafResults[ctx.ord] = results; - } + List<FutureTask<TopDocs>> tasks = + reader.leaves().stream() + .map( + ctx -> + new FutureTask<>( + () -> { + try { + TopDocs results = searchLeaf(ctx, filterWeight); + if (ctx.docBase > 0) { + for (ScoreDoc scoreDoc : results.scoreDocs) { + scoreDoc.doc += ctx.docBase; + } + } + return results; + } catch (IOException e) { + throw new UncheckedIOException(e); + } + })) + .toList(); + + Executor executor = Objects.requireNonNullElse(indexSearcher.getExecutor(), Runnable::run); + SliceExecutor sliceExecutor = new SliceExecutor(executor); + sliceExecutor.invokeAll(tasks); Review Comment: I wonder if this is right. Thinking out loud, I would assume that users who leverage IndexSearcher concurrency generally have two thread pools, one that they pass to the IndexSearcher constructor that they expect to do the heavy work, and another one, which is the one where `IndexSearcher#search` is called, that mostly handles coordination and lightweight work such as merging top hits coming from different shards but generally spends most of its time waiting for work to complete in the other threadpool. Your optimization suggestion boils dow to running some heavy work (a vector search) in the coordinating threadpool when there is a single segment. If heavy work may happen in either threadpool, this makes sizing these threadpools complicated, as either you allocate `num_cores` threads to the threadpool that does heavy work but then you may end up with more than `num_cores` threads doing heavy work because some heavy work also happens in the coordinating threadpool., or you allocate less than `num_cores` threads but then you might not use all your hardware? That said, your suggestion aligns with how IndexSearcher currently works, so maybe we should apply it for now and discuss in a follow-up issue whether we should also delegate to the executor when there is a single segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org