Dear all,
I'm experimenting with quantized vectors. I have indexed the Deep1B dataset
with beamWidth=32 and maxConnections=200. I created 2 indexes: one with 32
and one with 7 bit quantization. For searching, I used k=50 for the
`KnnFloatVectorQuery` and n=10 for `IndexSearcher.search`.
int requiredDocs = 10;
int numCandidates = 50;
Query query =
new KnnFloatVectorQuery("vector", queryVector, numCandidates);
TopDocs docs = searcher.search(query, requiredDocs);
I calculated the recall for the ground truth. It was 83.62% for the 32-bit
index and 65.18%, which is surprisingly low.
I noticed that the raw vector files (*.vec) are opened, but not read at
all. So I tried searching for `numCandidates` documents (`n` and `k` were
both 50`), and then re-ranked manually using the original vectors, and my
recall for the quantized index rose to 81.70%, which is mere 2% percentage
points less than the 32-bit index.
Does Lucene support the coarse search with re-ranking? If yes, what is the
API?
Viliam