Dear all,

I'm experimenting with quantized vectors. I have indexed the Deep1B dataset
with beamWidth=32 and maxConnections=200. I created 2 indexes: one with 32
and one with 7 bit quantization. For searching, I used k=50 for the
`KnnFloatVectorQuery` and n=10 for `IndexSearcher.search`.

    int requiredDocs = 10;
    int numCandidates = 50;
    Query query =
      new KnnFloatVectorQuery("vector", queryVector, numCandidates);
    TopDocs docs = searcher.search(query, requiredDocs);

I calculated the recall for the ground truth. It was 83.62% for the 32-bit
index and 65.18%, which is surprisingly low.

I noticed that the raw vector files (*.vec) are opened, but not read at
all. So I tried searching for `numCandidates` documents (`n` and `k` were
both 50`), and then re-ranked manually using the original vectors, and my
recall for the quantized index rose to 81.70%, which is mere 2% percentage
points less than the 32-bit index.

Does Lucene support the coarse search with re-ranking? If yes, what is the
API?

Viliam

Reply via email to