Hi all,

I would like to share some recent benchmark results from the SIMD_TS2DIFF 
repository [1]. 

This repository implements SIMD-accelerated decoding for the TS_2DIFF encoding 
format used in TsFile CPP.




** Benchmark Setup **

- Data size: 5,000,000 values

- Platforms: C++ (AVX2, using simde)

- Tested cases:

  - STABLE sequence (diff ∈ [1,100])

  - UNSTABLE sequence (diff ∈ [-50,100])

- Queries: EQ, GT, and BETWEEN filters




** Decoding Throughput **

- STABLE:

  - Scalar: 6.35 ms, 787 M vals/s

  - SIMD : 4.80 ms, 1043 M vals/s

- UNSTABLE:

  - Scalar: 6.84 ms, 731 M vals/s

  - SIMD : 4.76 ms, 1051 M vals/s

This shows a ~1.3–1.4× speedup on full-block decoding.




** Query Performance **

- Point EQ filter (single hit): SIMD achieves 70–80× faster than scalar.

- GT filter (high selectivity, e.g., > back value): SIMD achieves 20–25× faster.

- GT filter (low selectivity, millions of results): SIMD achieves ~1.2× faster.

- BETWEEN filter (small interval, tens of results): SIMD achieves ~2.5× faster.

All SIMD results were validated against scalar decoding (equal=true).




** Summary ** 

- SIMD decoding kernel reaches ~1.0 G vals/s.

- The speedup strongly depends on query selectivity: point/range queries 
benefit the most, while full scans are limited by memory bandwidth.




** Next Steps **

Recently I came across an excellent work published in ICDE 2025: “Exploring 
SIMD Vectorization in Aggregation Pipelines for Encoded IoT Data”. This paper 
provides systematic optimizations such as layout-aware vectorization, operator 
fusion, and pruning rules. Inspired by this, I plan to enhance our C++ 
implementation further and also consider applying these techniques in future 
Python-side TsFile analysis.




Contributions to the repo[1] are very welcome — please feel free to open issues 
or submit PRs in the repository.




Best,

Colin.




[1]. https://github.com/ColinLeeo/SIMD_TS2DIFF

Reply via email to