JingsongLi opened a new pull request, #20: URL: https://github.com/apache/paimon-vector-index/pull/20
## Summary Add an IVF_HNSW_SQ core index variant that combines IVF partitioning, per-partition HNSW graph search, and 8-bit scalar quantization for compressed vector storage. ## Changes - Add a reusable 8-bit `ScalarQuantizer` with batch encode/decode and zero-allocation distance calculation for L2, inner product, and cosine. - Add in-memory `IVFHNSWSQIndex` with train/add/build_graphs/search APIs, filter support, and SQ scan fallback when graphs are absent or filters are selective. - Add an `IVFHNSWSQIndexReader` file format and reader with single-query, batch-query, and Roaring filter search helpers. - Reuse IVF_HNSW_FLAT graph serialization and checked IO helpers for consistent HNSW graph validation. - Extend the recall benchmark to include IVF-HNSW-SQ alongside IVF-PQ, IVF-FLAT, and IVF-HNSW-FLAT. ## Testing - `cargo fmt --check` - `cargo test -p paimon-vindex-core` - `cargo clippy -p paimon-vindex-core --all-targets -- -D warnings` - `cargo check -p paimon-vindex-jni` - `cargo bench -p paimon-vindex-core --bench recall_bench --no-run` - `cargo test --manifest-path python/Cargo.toml --features auto-initialize --no-default-features` - `javac -d /tmp/paimon-vindex-java-test $(find jni/java jni/java-test -name '*.java')` ## Notes This PR intentionally keeps JNI/Python public APIs unchanged and adds the new index at the core layer first. The SQ implementation stores one byte per dimension and reconstructs approximate vectors for HNSW graph storage/search. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
