Thanks Adrien! The code has one issue: if (iterator.advance(leafDocID) == docID) should have been: if (iterator.advance(leafDocID) == leafDocID)
After fixing this, it works (for reference, I'm using Lucene 10.1). But I still wonder why can't we retrieve vectors just as we retrieve any other field. I was unable to figure the code out myself, this way it's pretty complicated. Is there any reason the vectors are not available through `storedFields()`? Viliam On Mon, Feb 10, 2025 at 9:21 PM Adrien Grand <jpou...@gmail.com> wrote: > Hi Viliam, > > Your logic is mostly correct, here is a version that should be a bit > simpler and correct (but beware, untested): > > IndexReader reader; // your multi-reader > int docID; // top-level doc ID > int readerID = ReaderUtil.subIndex(docID, reader.leaves()); > LeafReaderContext leafContext = reader.leaves().get(readerID); > int leafDocID = docID - leafContext.docBase; > FloatVectorValues values = > leafContext.reader().getFloatVectorValues("my_vector_field"); > DocIndexIterator iterator = values.iterator(); > float[] vector; > if (iterator.advance(leafDocID) == docID) { // this doc ID has a vector > vector = values.vectorValue(iterator.index()); > } else { > vector = null; > } > > On Mon, Feb 10, 2025 at 5:01 PM Viliam Ďurina <viliam.dur...@gmail.com> > wrote: > > > Dear all, > > > > when indexing vector fields, Lucene doesn't allow specifying the vector > > field as stored (it throws `IllegalStateException: Cannot store value of > > type class [F`). When trying to retrieve the value using > > `IndexReader.storedFields()`, the vector field isn't stored. > > > > However, Lucene 10 stores the vectors in `.vec` files. I was able to > > retrieve them using this complicated code, for which I had to make the > > `readerIndex` and `readerBase` methods in `BaseCompositeReader` public > > (they are protected): > > > > int docId = ...; // the docId to retrieve, e.g. coming out of a > search > > IndexReader node = reader.getContext().reader(); > > while (node instanceof BaseCompositeReader) { > > int index = ((BaseCompositeReader) node).readerIndex(docId); > > int base = ((BaseCompositeReader) node).readerBase(index); > > docId -= base; > > node = ((BaseCompositeReader) > > node).getContext().children().get(index).reader(); > > } > > assert node instanceof LeafReader; > > assert node.leaves().size() == 1; > > FloatVectorValues vectorValues = > > > > node.leaves().getFirst().reader().getFloatVectorValues("myVectorField"); > > float[] vector = vectorValues.vectorValue(docId); > > > > My reader is a `MultiReader`, composed of multiple `DirectoryReader`s. > > > > Is there any public API to retrieve the vector values? If not, is there > any > > particular reason to not make the vectors available, if Lucene stores > them > > anyway? Even if the vectors are quantized, original raw vectors are > stored, > > though they are never used. > > > > Thanks, > > Viliam > > > > > -- > Adrien >