github-actions[bot] commented on code in PR #64226:
URL: https://github.com/apache/doris/pull/64226#discussion_r3381421600
##########
be/src/storage/index/ann/faiss_ann_index.cpp:
##########
@@ -704,7 +705,10 @@ doris::Status FaissVectorIndex::ann_topn_search(const
float* query_vec, int k,
"IVF search parameters should not be null for IVF index");
}
faiss::SearchParametersIVF* param = new faiss::SearchParametersIVF();
- param->nprobe = ivf_params->nprobe;
+ // FAISS asserts nprobe > 0, so the lower bound guards against a crash
on
+ // nprobe < 1. nprobe > nlist is harmless (FAISS caps it at nlist
internally);
+ // we clamp the upper bound only to keep the value semantically
meaningful.
Review Comment:
This upper-bound clamp is using `_params.ivf_nlist`, but production-loaded
IVF indexes do not restore that field. `AnnIndexReader::load_index()` creates a
fresh `FaissVectorIndex`, calls `set_metric()`/`set_type()`, and then `load()`;
`load()` installs the FAISS index but leaves `_params.ivf_nlist` at its default
`1024`. For an IVF or IVF_ON_DISK index created with `nlist=4096`, `SET
ivf_nprobe=4096` now sends only 1024 probes to FAISS, reducing recall compared
with the previous behavior where FAISS used/capped against the real nlist. The
new save/load test with `nlist=4` also does not catch this because FAISS caps
the still-forwarded 1024 internally. Please derive the upper bound from the
loaded `faiss::IndexIVF::nlist` (or restore `_params.ivf_nlist` during
`load()`) before clamping; the same issue applies to the `range_search` clamp
below.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]