Hi Viliam,

Your logic is mostly correct, here is a version that should be a bit
simpler and correct (but beware, untested):

IndexReader reader; // your multi-reader
int docID; // top-level doc ID
int readerID = ReaderUtil.subIndex(docID, reader.leaves());
LeafReaderContext leafContext = reader.leaves().get(readerID);
int leafDocID = docID - leafContext.docBase;
FloatVectorValues values =
leafContext.reader().getFloatVectorValues("my_vector_field");
DocIndexIterator iterator = values.iterator();
float[] vector;
if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector
  vector = values.vectorValue(iterator.index());
} else {
  vector = null;
}

On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com>
wrote:

> Dear all,
>
> when indexing vector fields, Lucene doesn't allow specifying the vector
> field as stored (it throws `IllegalStateException: Cannot store value of
> type class [F`). When trying to retrieve the value using
> `IndexReader.storedFields()`, the vector field isn't stored.
>
> However, Lucene 10 stores the vectors in `.vec` files. I was able to
> retrieve them using this complicated code, for which I had to make the
> `readerIndex` and `readerBase` methods in `BaseCompositeReader` public
> (they are protected):
>
>     int docId = ...; // the docId to retrieve, e.g. coming out of a search
>     IndexReader node = reader.getContext().reader();
>     while (node instanceof BaseCompositeReader) {
>       int index = ((BaseCompositeReader) node).readerIndex(docId);
>       int base = ((BaseCompositeReader) node).readerBase(index);
>       docId -= base;
>       node = ((BaseCompositeReader)
> node).getContext().children().get(index).reader();
>     }
>     assert node instanceof LeafReader;
>     assert node.leaves().size() == 1;
>     FloatVectorValues vectorValues =
>
> node.leaves().getFirst().reader().getFloatVectorValues("myVectorField");
>     float[] vector = vectorValues.vectorValue(docId);
>
> My reader is a `MultiReader`, composed of multiple `DirectoryReader`s.
>
> Is there any public API to retrieve the vector values? If not, is there any
> particular reason to not make the vectors available, if Lucene stores them
> anyway? Even if the vectors are quantized, original raw vectors are stored,
> though they are never used.
>
> Thanks,
> Viliam
>


-- 
Adrien

Reply via email to