Hi,

I am not the sepcialist on vectors, but you could use HnswBitVectorsFormat in the custom codecs JAR file. This uses byte[] vectors, but each byte represents 8 dimensions. As distance it XORs the vectors and takes resulting bitcount. I know this, because I improved the bitcounting using long VarHandles to get 64 dimensions in one go (<https://github.com/apache/lucene/pull/13288/files#diff-1faf01efbf448c751b357e758254b2e623de1145b07bd8afcfe8a49b7dbde9cc>).

https://lucene.apache.org/core/10_2_0/codecs/org/apache/lucene/codecs/bitvectors/HnswBitVectorsFormat.html

But you have to quantisize on your own.

Uwe

Am 29.03.2024 um 08:28 schrieb Michael Wechner:
thanks for your feedback and pointers!

To play with binary vectors the following project might be useful

https://github.com/cohere-ai/BinaryVectorDB

Re Lucene, I will try to better understand what you suggest below.

Thanks

Michael

Am 29.03.24 um 07:35 schrieb Shubham Chaudhary:
btw, what about native binary embedding quantization support by Lucene?

This sounds like a good idea to have in Lucene.

Would this require another VetctorField /VectorsFormat?


Based on current implementation, one way would be to use another KNN format
or alternatively maybe a better approach would be to make
Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html>
configurable
to accept the type of quantization like this new work in progress PR for
int4 quantization <https://github.com/apache/lucene/pull/13197> support
which takes the number of bits to use for quantizing as input. Since this change allows passing 1 for bits to be used for quantization, it looks to
me like an enabler for binary quantization.

- Shubham

On Sun, Mar 24, 2024 at 4:34 AM Michael Wechner <michael.wech...@wyona.com>
wrote:

btw, what about native binary embedding quantization support by Lucene?


https://www.linkedin.com/posts/tomaarsen_binary-and-scalar-embedding-quantization-activity-7176966403332132864-lJzH?utm_source=share&utm_medium=member_desktop

Would this require another VetctorField /VectorsFormat?

Thanks

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:
Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the
user
to get those quantized vectors and index using KnnByteVectorField
<
https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html
,
but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>)
using
Lucene99ScalarQuantizedVectorsFormat
<
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html
that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-

https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene
-
https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101
Some other references :
-

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html
-

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html
-

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html


On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner <
michael.wech...@wyona.com>
wrote:

Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368


https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share&utm_medium=member_desktop
Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to