Stored fields is a separate format that stores data in a row-wise
fashion: all the stored data for a single document is written
together.  Vectors aren't *also* copied into stored fields storage, so
the stored fields API can't be used to retrieve them. If we did allow
that it would result in massive duplication for no purpose aside from
making things look simpler. But do you think that it would be more
convenient to use the stored fields API to retrieve the vectors?  Does
it hide the details of the leaf structure? Maybe there's an
opportunity to create some convenience API for vectors, not sure.

On Tue, Feb 11, 2025 at 8:45 AM Viliam Ďurina <viliam.dur...@gmail.com> wrote:
>
> Thanks Adrien!
>
> The code has one issue:
>     if (iterator.advance(leafDocID) == docID)
> should have been:
>     if (iterator.advance(leafDocID) == leafDocID)
>
> After fixing this, it works (for reference, I'm using Lucene 10.1). But I
> still wonder why can't we retrieve vectors just as we retrieve any other
> field. I was unable to figure the code out myself, this way it's pretty
> complicated. Is there any reason the vectors are not available through
> `storedFields()`?
>
> Viliam
>
> On Mon, Feb 10, 2025 at 9:21 PM Adrien Grand <jpou...@gmail.com> wrote:
>
> > Hi Viliam,
> >
> > Your logic is mostly correct, here is a version that should be a bit
> > simpler and correct (but beware, untested):
> >
> > IndexReader reader; // your multi-reader
> > int docID; // top-level doc ID
> > int readerID = ReaderUtil.subIndex(docID, reader.leaves());
> > LeafReaderContext leafContext = reader.leaves().get(readerID);
> > int leafDocID = docID - leafContext.docBase;
> > FloatVectorValues values =
> > leafContext.reader().getFloatVectorValues("my_vector_field");
> > DocIndexIterator iterator = values.iterator();
> > float[] vector;
> > if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector
> >   vector = values.vectorValue(iterator.index());
> > } else {
> >   vector = null;
> > }
> >
> > On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > when indexing vector fields, Lucene doesn't allow specifying the vector
> > > field as stored (it throws `IllegalStateException: Cannot store value of
> > > type class [F`). When trying to retrieve the value using
> > > `IndexReader.storedFields()`, the vector field isn't stored.
> > >
> > > However, Lucene 10 stores the vectors in `.vec` files. I was able to
> > > retrieve them using this complicated code, for which I had to make the
> > > `readerIndex` and `readerBase` methods in `BaseCompositeReader` public
> > > (they are protected):
> > >
> > >     int docId = ...; // the docId to retrieve, e.g. coming out of a
> > search
> > >     IndexReader node = reader.getContext().reader();
> > >     while (node instanceof BaseCompositeReader) {
> > >       int index = ((BaseCompositeReader) node).readerIndex(docId);
> > >       int base = ((BaseCompositeReader) node).readerBase(index);
> > >       docId -= base;
> > >       node = ((BaseCompositeReader)
> > > node).getContext().children().get(index).reader();
> > >     }
> > >     assert node instanceof LeafReader;
> > >     assert node.leaves().size() == 1;
> > >     FloatVectorValues vectorValues =
> > >
> > > node.leaves().getFirst().reader().getFloatVectorValues("myVectorField");
> > >     float[] vector = vectorValues.vectorValue(docId);
> > >
> > > My reader is a `MultiReader`, composed of multiple `DirectoryReader`s.
> > >
> > > Is there any public API to retrieve the vector values? If not, is there
> > any
> > > particular reason to not make the vectors available, if Lucene stores
> > them
> > > anyway? Even if the vectors are quantized, original raw vectors are
> > stored,
> > > though they are never used.
> > >
> > > Thanks,
> > > Viliam
> > >
> >
> >
> > --
> > Adrien
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to