Hey Ben
Cool, thank you!
I just tested it successfully with
int8, Cohere, embed-english-v3.0
and
flow32, https://huggingface.co/sentence-transformers/all-mpnet-base-v2
Am 10.09.25 um 21:48 schrieb Benjamin Trent:
Michael,
Yeah, you should be able to use it for either input type. The int8
will just pass through and be used as "normal" int8 and not quantized
at all.
Then only the float32 fields will end up being quantized.
On Wed, Sep 10, 2025 at 3:42 PM Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi Ben
Thanks very much for the insight!
So IIUC it is correct to
use Lucene102HnswBinaryQuantizedVectorsFormat() for float32 and
int8, right?
Thanks
Michael
Am 10.09.25 um 17:58 schrieb Benjamin Trent:
Hey Michael,
Right now it won't quantize byte vectors that are provided. It
just passes them and treats them like normal.
In the future, we would like to quantize the bytes as well!
On Wed, Sep 10, 2025, 10:56 AM Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi Uwe
Thanks for your feedback!
I am using now:
private final int maxDimensions =16384;
Codec codec =new Lucene101Codec() {
@Override public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {var delegate =new
Lucene102HnswBinaryQuantizedVectorsFormat();
return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
}
};
return codec;
This seems to work fine, but I do not understand whether there is a
difference anymore between float32 and int8 vector values,
or whetherLucene102HnswBinaryQuantizedVectorsFormat always "quantizes"
the values?
Thanks
Michael
Am 10.09.25 um 10:19 schrieb Uwe Schindler:
Hi,
I think the best is to check the source code of default
codec of the upgrade version and start from there. I agree,
there should possibly a documentation available that gives
the default componenets used in the default codec of a given
release.
Uwe
Am 09.09.2025 um 22:49 schrieb Michael Wechner:
Hi
With Lucene 9.12.0 I set my own custom max vector dimension
using
Codec codecInt8 = new Lucene99Codec() { @Override public
KnnVectorsFormat getKnnVectorsFormatForField(String field)
{ var delegate = new
Lucene99HnswScalarQuantizedVectorsFormat(); log.info
<http://log.info>("Vector Value Type: int8, Maximum Vector
Dimension: " + maxDimensions); return new
DelegatingKnnVectorsFormat(delegate, maxDimensions); } }; and
Codec codecFloat32 =new Lucene99Codec() {
@Override public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
var delegate =new Lucene99HnswVectorsFormat();
log.info("Vector Value Type: float32, Maximum Vector Dimension:
" +maxDimensions);
return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
}
};
I am a little confused re which Codec / Vector Format classes I should
use when upgrading to Lucene version 10.2.2
Any hints would be much appreciated!
Thanks
Michael
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
<https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>
https://www.thetaphi.de
eMail:u...@thetaphi.de