Hi all, I would like to share some recent benchmark results from the SIMD_TS2DIFF repository [1].
This repository implements SIMD-accelerated decoding for the TS_2DIFF encoding format used in TsFile CPP. ** Benchmark Setup ** - Data size: 5,000,000 values - Platforms: C++ (AVX2, using simde) - Tested cases: - STABLE sequence (diff ∈ [1,100]) - UNSTABLE sequence (diff ∈ [-50,100]) - Queries: EQ, GT, and BETWEEN filters ** Decoding Throughput ** - STABLE: - Scalar: 6.35 ms, 787 M vals/s - SIMD : 4.80 ms, 1043 M vals/s - UNSTABLE: - Scalar: 6.84 ms, 731 M vals/s - SIMD : 4.76 ms, 1051 M vals/s This shows a ~1.3–1.4× speedup on full-block decoding. ** Query Performance ** - Point EQ filter (single hit): SIMD achieves 70–80× faster than scalar. - GT filter (high selectivity, e.g., > back value): SIMD achieves 20–25× faster. - GT filter (low selectivity, millions of results): SIMD achieves ~1.2× faster. - BETWEEN filter (small interval, tens of results): SIMD achieves ~2.5× faster. All SIMD results were validated against scalar decoding (equal=true). ** Summary ** - SIMD decoding kernel reaches ~1.0 G vals/s. - The speedup strongly depends on query selectivity: point/range queries benefit the most, while full scans are limited by memory bandwidth. ** Next Steps ** Recently I came across an excellent work published in ICDE 2025: “Exploring SIMD Vectorization in Aggregation Pipelines for Encoded IoT Data”. This paper provides systematic optimizations such as layout-aware vectorization, operator fusion, and pruning rules. Inspired by this, I plan to enhance our C++ implementation further and also consider applying these techniques in future Python-side TsFile analysis. Contributions to the repo[1] are very welcome — please feel free to open issues or submit PRs in the repository. Best, Colin. [1]. https://github.com/ColinLeeo/SIMD_TS2DIFF