[PR] [Parquet] Add SIMD-accelerated byte-stream-split decoding [arrow-go]

via GitHub Mon, 02 Feb 2026 07:20:10 -0800


daniel-adam-tfs opened a new pull request, #654:
URL: https://github.com/apache/arrow-go/pull/654


   ### Rationale for this change
   The byte-stream-split encoding is commonly used in Parquet for 
floating-point data, as it improves compression ratios by grouping similar 
bytes together. However, the existing Go implementation uses a simple scalar 
loop which is inefficient for large datasets. By leveraging SIMD instructions 
(AVX2 on x86 and NEON on ARM), we can significantly accelerate the decoding 
process and improve overall Parquet read performance.
    
   ### What changes are included in this PR?
   Optimized implementation of byte-stream split decoding algorithm.
   
   Added SIMD-accelerated implementations:
   AVX2 implementation for amd64 architecture using 256-bit vectors processing 
32 values per block
   NEON implementation for arm64 architecture using 128-bit vectors processing 
16 values per block
   Both use 2-stage byte unpacking hierarchy following the same algorithm 
structure
   Implemented runtime CPU feature detection with automatic dispatch to the 
best available implementation (SIMD vs scalar fallback)
   Added proper build tags and file suffixes for cross-platform compatibility
   Included an optimized V2 scalar implementation using unsafe pointer casting 
as a fallback
   
   ### Are these changes tested?
   Yes. Various tests were added:
   
   - Correctness tests covering various input sizes (1, 2, 7, 8, 31, 32, 33, 
63, 64, 65, 127, 128, 129, 255, 256, 512, 1024) to validate all implementations 
(Reference, V2, AVX2, NEON)
   - Edge case tests including exact block boundaries, single values, all-zero 
data, and all-ones data
   - Benchmark suite with multiple data sizes (8, 64, 512, 4096, 32768, 262144 
values) comparing all implementations
   
   ### Are there any user-facing changes?
   No user-facing API changes. This is a performance optimization that 
maintains full backward compatibility. Users will automatically benefit from 
faster Parquet decoding when reading files with byte-stream-split encoded 
floating-point columns, with no code changes required.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Parquet] Add SIMD-accelerated byte-stream-split decoding [arrow-go]

Reply via email to