Re: scalar quantization heap usage during merge

Benjamin Trent Wed, 12 Jun 2024 09:42:59 -0700

Michael,

Empirically, I am not surprised there is an increase in heap usage. We
do have extra overhead with the scalar quantization on flush. There
may also be some additional heap usage on merge.


I just don't think it is via: Lucene99FlatVectorsWriter

On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov <msoko...@gmail.com> wrote:
>
>  Empirically I thought I saw the need to increase JVM heap with this,
> but let me do some more testing to narrow down what is going on. It's
> possible the same heap requirements exist for the non-quantized case
> and I am just seeing some random vagary of the merge process happening
> to tip over a limit. It's also possible I messed something up in
> https://github.com/apache/lucene/pull/13469 which I am trying to use
> in order to index quantized vectors without building an HNSW graph.
>
> On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent <ben.w.tr...@gmail.com> wrote:
> >
> > Heya Michael,
> >
> > > the first one I traced was referenced by vector writers involved in a 
> > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?
> >
> > Yes, that is holding the raw floats before flush. You should see
> > nearly the exact same overhead there as you would indexing raw
> > vectors. I would be surprised if there is a significant memory usage
> > difference due to Lucene99FlatVectorsWriter when using quantized vs.
> > not.
> >
> > The flow is this:
> >
> >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> > of it (does this no matter what) and passes on to the next part of the
> > chain
> >  - If quantizing, the next part of the chain is
> > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
> > REFERENCE to the array, it doesn't copy it. The float vector array is
> > then passed to the HNSW indexer (if its being used), which also does
> > NOT copy, but keeps a reference.
> >  - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
> > it directly to the hnsw indexer, which does not copy it, but does add
> > it to the HNSW graph
> >
> > > I wonder if there is an opportunity to move some of this off-heap?
> >
> > I think we could do some things off-heap in the ScalarQuantizer. Maybe
> > even during "flush", but we would have to adjust the interfaces some
> > so that the scalarquantizer can know where the vectors are being
> > stored after the initial flush. Right now there is no way to know the
> > file nor file handle.
> >
> > > I can imagine that when we requantize we need to scan all the vectors to 
> > > determine the new quantization settings?
> >
> > We shouldn't be scanning every vector. We do take a sampling, though
> > that sampling can be large. There is here an opportunity for off-heap
> > action if possible. Though I don't know how we could do that before
> > flush. I could see the off-heap idea helping on merge.
> >
> > > Maybe we could do two passes - merge the float vectors while 
> > > recalculating, and then re-scan to do the actual quantization?
> >
> > I am not sure what you mean here by "merge the float vectors". If you
> > mean simply reading the individual float vector files and combining
> > them into a single file, we already do that separately from
> > quantizing.
> >
> > Thank you for digging into this. Glad others are experimenting!
> >
> > Ben
> >
> > On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov <msoko...@gmail.com> wrote:
> > >
> > > Hi folks. I've been experimenting with our new scalar quantization
> > > support - yay, thanks for adding it! I'm finding that when I index a
> > > large number of large vectors, enabling quantization (vs simply
> > > indexing the full-width floats) requires more heap - I keep getting
> > > OOMs and have to increase heap size. I took a heap dump, and not
> > > surprisingly I found some big arrays of floats and bytes, and the
> > > first one I traced was referenced by vector writers involved in a
> > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
> > > expected? I wonder if there is an opportunity to move some of this
> > > off-heap?  I can imagine that when we requantize we need to scan all
> > > the vectors to determine the new quantization settings?  Maybe we
> > > could do two passes - merge the float vectors while recalculating, and
> > > then re-scan to do the actual quantization?
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: scalar quantization heap usage during merge

Reply via email to