JingsongLi opened a new pull request, #43:
URL: https://github.com/apache/paimon-vector-index/pull/43

   ## Summary
   
   Optimize vector-index hot paths with SIMD and batched matrix operations 
across distance primitives, PQ/IVF_PQ build/search paths, OPQ rotation, HNSW 
workspace reuse, and scalar quantization.
   
   The main goal is to reduce repeated scalar loops in build/add/query paths by 
reusing existing SIMD primitives and `sgemm_a_bt` where the computation 
naturally maps to batched dot products.
   
   ## Changes
   
   - Add SIMD/SGEMM-backed distance helpers for inner product, L2 norms, PQ 
sub-vector distance batches, and cosine query-distance reuse.
   - Batch IVF assignment and residual computation in IVF_PQ/IVF_FLAT-related 
build/add paths.
   - Speed up PQ encoding by scanning each sub-codebook through batched 
distance helpers and reusing per-worker scratch buffers.
   - Apply OPQ rotations with `sgemm_a_bt` for batches and SIMD dot products 
for single vectors.
   - Reuse HNSW search workspaces and cache cosine vector norms to reduce 
repeated allocation and norm recomputation.
   - Add SIMD paths for scalar quantizer bounds, encode, and decode-with-offset 
operations.
   
   ## Benchmark Results
   
   Baseline: `054d736` (`main` before this PR)  
   Optimized: `f260625` (`codex/simd-optimizations`)  
   Profile: `cargo bench` release profile, single local run per version.
   
   ### `pq4_bench`
   
   Command:
   
   ```bash
   cargo bench -p paimon-vindex-core --bench pq4_bench
   ```
   
   Dataset: 100K vectors, d=128, nlist=256, nprobe=8, k=10, nq=100.
   
   | Metric | Baseline | Optimized | Change |
   | --- | ---: | ---: | ---: |
   | 8-bit IVF_PQ build | 3.34s | 3.39s | ~0.99x |
   | 4-bit IVF_PQ build | 1.87s | 1.87s | ~1.00x |
   | 8-bit IVF_PQ query | 32 us/query | 16 us/query | ~2.00x faster |
   | 4-bit IVF_PQ query | 12 us/query | 11 us/query | ~1.09x faster |
   | 4-bit FastScan, one list | 3 us | 3 us | ~1.00x |
   
   ### `ann_bench`
   
   Command:
   
   ```bash
   ANN_N=30000 ANN_NQ=300 ANN_D=128 ANN_NLIST=128 ANN_NPROBE=16 ANN_PQ_M=16 \
   ANN_HNSW_EF_CONSTRUCTION=100 ANN_HNSW_EF_SEARCH=80 \
   cargo bench -p paimon-vindex-core --bench ann_bench
   ```
   
   | Index | Baseline build | Optimized build | Build change | Baseline search 
| Optimized search | Search change |
   | --- | ---: | ---: | ---: | ---: | ---: | ---: |
   | IVF_PQ | 1077 ms | 1050 ms | ~1.03x faster | 59 ms | 61 ms | ~0.97x |
   | IVF_HNSW_FLAT | 794 ms | 765 ms | ~1.04x faster | 71 ms | 69 ms | ~1.03x 
faster |
   | IVF_HNSW_SQ | 816 ms | 755 ms | ~1.08x faster | 70 ms | 69 ms | ~1.01x 
faster |
   
   ## Testing
   
   - `cargo test -p paimon-vindex-core`
   - `cargo clippy -p paimon-vindex-core --all-targets -- -D warnings`
   - `cargo fmt --check`
   - `git diff --check HEAD~1..HEAD`
   - Benchmarked optimized branch against `HEAD~1` using the commands above.
   
   ## Notes
   
   The build/search macro-benchmarks include training, graph construction, I/O, 
and other work, so individual SIMD hot-path wins are partially diluted there. 
The clearest observed query improvement is the 8-bit IVF_PQ query path in 
`pq4_bench`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to