Thanks Adrien!

The code has one issue:
    if (iterator.advance(leafDocID) == docID)
should have been:
    if (iterator.advance(leafDocID) == leafDocID)

After fixing this, it works (for reference, I'm using Lucene 10.1). But I
still wonder why can't we retrieve vectors just as we retrieve any other
field. I was unable to figure the code out myself, this way it's pretty
complicated. Is there any reason the vectors are not available through
`storedFields()`?

Viliam

On Mon, Feb 10, 2025 at 9:21 PM Adrien Grand <jpou...@gmail.com> wrote:

> Hi Viliam,
>
> Your logic is mostly correct, here is a version that should be a bit
> simpler and correct (but beware, untested):
>
> IndexReader reader; // your multi-reader
> int docID; // top-level doc ID
> int readerID = ReaderUtil.subIndex(docID, reader.leaves());
> LeafReaderContext leafContext = reader.leaves().get(readerID);
> int leafDocID = docID - leafContext.docBase;
> FloatVectorValues values =
> leafContext.reader().getFloatVectorValues("my_vector_field");
> DocIndexIterator iterator = values.iterator();
> float[] vector;
> if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector
>   vector = values.vectorValue(iterator.index());
> } else {
>   vector = null;
> }
>
> On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > when indexing vector fields, Lucene doesn't allow specifying the vector
> > field as stored (it throws `IllegalStateException: Cannot store value of
> > type class [F`). When trying to retrieve the value using
> > `IndexReader.storedFields()`, the vector field isn't stored.
> >
> > However, Lucene 10 stores the vectors in `.vec` files. I was able to
> > retrieve them using this complicated code, for which I had to make the
> > `readerIndex` and `readerBase` methods in `BaseCompositeReader` public
> > (they are protected):
> >
> >     int docId = ...; // the docId to retrieve, e.g. coming out of a
> search
> >     IndexReader node = reader.getContext().reader();
> >     while (node instanceof BaseCompositeReader) {
> >       int index = ((BaseCompositeReader) node).readerIndex(docId);
> >       int base = ((BaseCompositeReader) node).readerBase(index);
> >       docId -= base;
> >       node = ((BaseCompositeReader)
> > node).getContext().children().get(index).reader();
> >     }
> >     assert node instanceof LeafReader;
> >     assert node.leaves().size() == 1;
> >     FloatVectorValues vectorValues =
> >
> > node.leaves().getFirst().reader().getFloatVectorValues("myVectorField");
> >     float[] vector = vectorValues.vectorValue(docId);
> >
> > My reader is a `MultiReader`, composed of multiple `DirectoryReader`s.
> >
> > Is there any public API to retrieve the vector values? If not, is there
> any
> > particular reason to not make the vectors available, if Lucene stores
> them
> > anyway? Even if the vectors are quantized, original raw vectors are
> stored,
> > though they are never used.
> >
> > Thanks,
> > Viliam
> >
>
>
> --
> Adrien
>

Reply via email to