Hey Ben

Cool, thank you!

I just tested it successfully with

int8, Cohere, embed-english-v3.0

and

flow32, https://huggingface.co/sentence-transformers/all-mpnet-base-v2


Am 10.09.25 um 21:48 schrieb Benjamin Trent:
Michael,

Yeah, you should be able to use it for either input type. The int8 will just pass through and be used as "normal" int8 and not quantized at all.

Then only the float32 fields will end up being quantized.

On Wed, Sep 10, 2025 at 3:42 PM Michael Wechner <michael.wech...@wyona.com> wrote:

    Hi Ben

    Thanks very much for the insight!

    So IIUC it is correct to
    use Lucene102HnswBinaryQuantizedVectorsFormat() for float32 and
    int8, right?

    Thanks

    Michael

    Am 10.09.25 um 17:58 schrieb Benjamin Trent:
    Hey Michael,

    Right now it won't quantize byte vectors that are provided. It
    just passes them and treats them like normal.

    In the future, we would like to quantize the bytes as well!

    On Wed, Sep 10, 2025, 10:56 AM Michael Wechner
    <michael.wech...@wyona.com> wrote:

        Hi Uwe

        Thanks for your feedback!

        I am using now:

        private final int maxDimensions =16384;

        Codec codec =new Lucene101Codec() {
             @Override public KnnVectorsFormat 
getKnnVectorsFormatForField(String field) {var delegate =new 
Lucene102HnswBinaryQuantizedVectorsFormat();
                 return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
             }
        };
        return codec;

        This seems to work fine, but I do not understand whether there is a 
difference anymore between float32 and int8 vector values,
        or whetherLucene102HnswBinaryQuantizedVectorsFormat always "quantizes" 
the values?

        Thanks

        Michael


        Am 10.09.25 um 10:19 schrieb Uwe Schindler:

        Hi,

        I think the best is to check the source code of default
        codec of the upgrade version and start from there. I agree,
        there should possibly a documentation available that gives
        the default componenets used in the default codec of a given
        release.

        Uwe

        Am 09.09.2025 um 22:49 schrieb Michael Wechner:

        Hi

        With Lucene 9.12.0 I set my own custom max vector dimension
        using

        Codec codecInt8 = new Lucene99Codec() { @Override public
        KnnVectorsFormat getKnnVectorsFormatForField(String field)
        { var delegate = new
        Lucene99HnswScalarQuantizedVectorsFormat(); log.info
        <http://log.info>("Vector Value Type: int8, Maximum Vector
        Dimension: " + maxDimensions); return new
        DelegatingKnnVectorsFormat(delegate, maxDimensions); } }; and
        Codec codecFloat32 =new Lucene99Codec() {
             @Override public KnnVectorsFormat 
getKnnVectorsFormatForField(String field) {
                 var delegate =new Lucene99HnswVectorsFormat();
                 log.info("Vector Value Type: float32, Maximum Vector Dimension: 
" +maxDimensions);
                 return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
             }
        };

        I am a little confused re which Codec / Vector Format classes I should 
use when upgrading to Lucene version 10.2.2

        Any hints would be much appreciated!

        Thanks

        Michael


-- Uwe Schindler
        Achterdiek 19, D-28357 Bremen 
<https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>
        https://www.thetaphi.de
        eMail:u...@thetaphi.de

Reply via email to