Michael, Empirically, I am not surprised there is an increase in heap usage. We do have extra overhead with the scalar quantization on flush. There may also be some additional heap usage on merge.
I just don't think it is via: Lucene99FlatVectorsWriter On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov <msoko...@gmail.com> wrote: > > Empirically I thought I saw the need to increase JVM heap with this, > but let me do some more testing to narrow down what is going on. It's > possible the same heap requirements exist for the non-quantized case > and I am just seeing some random vagary of the merge process happening > to tip over a limit. It's also possible I messed something up in > https://github.com/apache/lucene/pull/13469 which I am trying to use > in order to index quantized vectors without building an HNSW graph. > > On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent <ben.w.tr...@gmail.com> wrote: > > > > Heya Michael, > > > > > the first one I traced was referenced by vector writers involved in a > > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected? > > > > Yes, that is holding the raw floats before flush. You should see > > nearly the exact same overhead there as you would indexing raw > > vectors. I would be surprised if there is a significant memory usage > > difference due to Lucene99FlatVectorsWriter when using quantized vs. > > not. > > > > The flow is this: > > > > - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy > > of it (does this no matter what) and passes on to the next part of the > > chain > > - If quantizing, the next part of the chain is > > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a > > REFERENCE to the array, it doesn't copy it. The float vector array is > > then passed to the HNSW indexer (if its being used), which also does > > NOT copy, but keeps a reference. > > - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass > > it directly to the hnsw indexer, which does not copy it, but does add > > it to the HNSW graph > > > > > I wonder if there is an opportunity to move some of this off-heap? > > > > I think we could do some things off-heap in the ScalarQuantizer. Maybe > > even during "flush", but we would have to adjust the interfaces some > > so that the scalarquantizer can know where the vectors are being > > stored after the initial flush. Right now there is no way to know the > > file nor file handle. > > > > > I can imagine that when we requantize we need to scan all the vectors to > > > determine the new quantization settings? > > > > We shouldn't be scanning every vector. We do take a sampling, though > > that sampling can be large. There is here an opportunity for off-heap > > action if possible. Though I don't know how we could do that before > > flush. I could see the off-heap idea helping on merge. > > > > > Maybe we could do two passes - merge the float vectors while > > > recalculating, and then re-scan to do the actual quantization? > > > > I am not sure what you mean here by "merge the float vectors". If you > > mean simply reading the individual float vector files and combining > > them into a single file, we already do that separately from > > quantizing. > > > > Thank you for digging into this. Glad others are experimenting! > > > > Ben > > > > On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov <msoko...@gmail.com> wrote: > > > > > > Hi folks. I've been experimenting with our new scalar quantization > > > support - yay, thanks for adding it! I'm finding that when I index a > > > large number of large vectors, enabling quantization (vs simply > > > indexing the full-width floats) requires more heap - I keep getting > > > OOMs and have to increase heap size. I took a heap dump, and not > > > surprisingly I found some big arrays of floats and bytes, and the > > > first one I traced was referenced by vector writers involved in a > > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this > > > expected? I wonder if there is an opportunity to move some of this > > > off-heap? I can imagine that when we requantize we need to scan all > > > the vectors to determine the new quantization settings? Maybe we > > > could do two passes - merge the float vectors while recalculating, and > > > then re-scan to do the actual quantization? > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org