EeshanBembi commented on PR #17554: URL: https://github.com/apache/datafusion/pull/17554#issuecomment-3396322280
> #17539 Hey Andy! Great question about the performance impact! There are criterion benchmarks in `/datafusion/physical-expr/benches/binary_op.rs` that measure exactly what you're asking about. ### The Bottom Line Yeah, there's definitely a performance hit: - **Individual arithmetic operations**: 4.17x slower (591ns → 2,463ns) - **Real queries**: About 30% slower for arithmetic-heavy workloads ### Try It Yourself To run these benchmarks yourself: ```bash # Quick sanity check cargo bench --bench binary_op -- "overflow_check.*small_numbers.*add" # Full performance analysis cargo bench --bench binary_op -- overflow_check # See how it handles actual overflow scenarios cargo bench --bench binary_op -- real_overflow ``` ### What I Did to Make It Fast 1. **Small arrays (≤500 elements)**: Actually pretty close to native speed (0.5-1x overhead) 2. **Medium arrays (1K-2K)**: SIMD vectorization kicks in 3. **Large arrays (>2K)**: Smart strategy selection based on the data ## The Optimization Deep Dive - **CPU-level assembly**: Using actual overflow flags on x86/ARM instead of software checks - **SIMD when possible**: Process multiple elements at once - **Bit manipulation tricks**: `((a ^ sum) & (b ^ sum)) < 0` is faster than branches - **Compile-time analysis**: If we can prove it's safe, zero overhead ## Bottom Line I think this is the right default. DataFusion should behave like other SQL engines by default, and give you the escape hatch when you need it. The performance hit is real but not unreasonable for the correctness we get. That said, if you think this is too much overhead for the default, I'm totally open to discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
