JingsongLi opened a new pull request, #43: URL: https://github.com/apache/paimon-vector-index/pull/43
## Summary Optimize vector-index hot paths with SIMD and batched matrix operations across distance primitives, PQ/IVF_PQ build/search paths, OPQ rotation, HNSW workspace reuse, and scalar quantization. The main goal is to reduce repeated scalar loops in build/add/query paths by reusing existing SIMD primitives and `sgemm_a_bt` where the computation naturally maps to batched dot products. ## Changes - Add SIMD/SGEMM-backed distance helpers for inner product, L2 norms, PQ sub-vector distance batches, and cosine query-distance reuse. - Batch IVF assignment and residual computation in IVF_PQ/IVF_FLAT-related build/add paths. - Speed up PQ encoding by scanning each sub-codebook through batched distance helpers and reusing per-worker scratch buffers. - Apply OPQ rotations with `sgemm_a_bt` for batches and SIMD dot products for single vectors. - Reuse HNSW search workspaces and cache cosine vector norms to reduce repeated allocation and norm recomputation. - Add SIMD paths for scalar quantizer bounds, encode, and decode-with-offset operations. ## Benchmark Results Baseline: `054d736` (`main` before this PR) Optimized: `f260625` (`codex/simd-optimizations`) Profile: `cargo bench` release profile, single local run per version. ### `pq4_bench` Command: ```bash cargo bench -p paimon-vindex-core --bench pq4_bench ``` Dataset: 100K vectors, d=128, nlist=256, nprobe=8, k=10, nq=100. | Metric | Baseline | Optimized | Change | | --- | ---: | ---: | ---: | | 8-bit IVF_PQ build | 3.34s | 3.39s | ~0.99x | | 4-bit IVF_PQ build | 1.87s | 1.87s | ~1.00x | | 8-bit IVF_PQ query | 32 us/query | 16 us/query | ~2.00x faster | | 4-bit IVF_PQ query | 12 us/query | 11 us/query | ~1.09x faster | | 4-bit FastScan, one list | 3 us | 3 us | ~1.00x | ### `ann_bench` Command: ```bash ANN_N=30000 ANN_NQ=300 ANN_D=128 ANN_NLIST=128 ANN_NPROBE=16 ANN_PQ_M=16 \ ANN_HNSW_EF_CONSTRUCTION=100 ANN_HNSW_EF_SEARCH=80 \ cargo bench -p paimon-vindex-core --bench ann_bench ``` | Index | Baseline build | Optimized build | Build change | Baseline search | Optimized search | Search change | | --- | ---: | ---: | ---: | ---: | ---: | ---: | | IVF_PQ | 1077 ms | 1050 ms | ~1.03x faster | 59 ms | 61 ms | ~0.97x | | IVF_HNSW_FLAT | 794 ms | 765 ms | ~1.04x faster | 71 ms | 69 ms | ~1.03x faster | | IVF_HNSW_SQ | 816 ms | 755 ms | ~1.08x faster | 70 ms | 69 ms | ~1.01x faster | ## Testing - `cargo test -p paimon-vindex-core` - `cargo clippy -p paimon-vindex-core --all-targets -- -D warnings` - `cargo fmt --check` - `git diff --check HEAD~1..HEAD` - Benchmarked optimized branch against `HEAD~1` using the commands above. ## Notes The build/search macro-benchmarks include training, graph construction, I/O, and other work, so individual SIMD hot-path wins are partially diluted there. The clearest observed query improvement is the 8-bit IVF_PQ query path in `pq4_bench`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
