Re: Does Lucene Vector Search support int8 and / or even binary?

Michael Wechner Fri, 29 Mar 2024 00:28:28 -0700

thanks for your feedback and pointers!

To play with binary vectors the following project might be useful


https://github.com/cohere-ai/BinaryVectorDB

Re Lucene, I will try to better understand what you suggest below.

Thanks

Michael

Am 29.03.24 um 07:35 schrieb Shubham Chaudhary:

btw, what about native binary embedding quantization support by Lucene?


This sounds like a good idea to have in Lucene.

Would this require another VetctorField /VectorsFormat?


Based on current implementation, one way would be to use another KNN format
or alternatively maybe a better approach would be to make
Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html>
configurable
to accept the type of quantization like this new work in progress PR for
int4 quantization <https://github.com/apache/lucene/pull/13197> support
which takes the number of bits to use for quantizing as input. Since this
change allows passing 1 for bits to be used for quantization, it looks to
me like an enabler for binary quantization.

- Shubham

On Sun, Mar 24, 2024 at 4:34 AM Michael Wechner <[email protected]>
wrote:

btw, what about native binary embedding quantization support by Lucene?


https://www.linkedin.com/posts/tomaarsen_binary-and-scalar-embedding-quantization-activity-7176966403332132864-lJzH?utm_source=share&utm_medium=member_desktop

Would this require another VetctorField /VectorsFormat?

Thanks

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:

Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the

user

to get those quantized vectors and index using KnnByteVectorField
<

https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html

,
but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>)

using

Lucene99ScalarQuantizedVectorsFormat
<

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html

that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-

https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene

https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101

Some other references :
-

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html



On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner <

[email protected]>

wrote:

Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368

https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share&utm_medium=member_desktop

Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Does Lucene Vector Search support int8 and / or even binary?

Reply via email to