EeshanBembi commented on PR #17554:
URL: https://github.com/apache/datafusion/pull/17554#issuecomment-3396322280

   > #17539
   
   
   
   Hey Andy! Great question about the performance impact!
   
   There are criterion benchmarks in 
`/datafusion/physical-expr/benches/binary_op.rs` that measure exactly what 
you're asking about.
   
   ### The Bottom Line
   
   Yeah, there's definitely a performance hit:
   
   - **Individual arithmetic operations**: 4.17x slower (591ns → 2,463ns)
   - **Real queries**: About 30% slower for arithmetic-heavy workloads
   
   ### Try It Yourself
   
   To run these benchmarks yourself:
   
   ```bash
   # Quick sanity check
   cargo bench --bench binary_op -- "overflow_check.*small_numbers.*add"
   
   # Full performance analysis  
   cargo bench --bench binary_op -- overflow_check
   
   # See how it handles actual overflow scenarios
   cargo bench --bench binary_op -- real_overflow
   ```
   
   ### What I Did to Make It Fast
   
   
   
   1. **Small arrays (≤500 elements)**: Actually pretty close to native speed 
(0.5-1x overhead)
   2. **Medium arrays (1K-2K)**: SIMD vectorization kicks in
   3. **Large arrays (>2K)**: Smart strategy selection based on the data
   
   ## The Optimization Deep Dive
   
   - **CPU-level assembly**: Using actual overflow flags on x86/ARM instead of 
software checks
   - **SIMD when possible**: Process multiple elements at once  
   - **Bit manipulation tricks**: `((a ^ sum) & (b ^ sum)) < 0` is faster than 
branches
   - **Compile-time analysis**: If we can prove it's safe, zero overhead
   
   ## Bottom Line
   
   I think this is the right default. DataFusion should behave like other SQL 
engines by default, and give you the escape hatch when you need it. The 
performance hit is real but not unreasonable for the correctness we get.
   
   That said, if you think this is too much overhead for the default, I'm 
totally open to discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to