[GitHub] [arrow-rs] sunchao commented on issue #3620: Implement short-circuiting for filter evaluation

via GitHub Fri, 27 Jan 2023 11:38:29 -0800


sunchao commented on issue #3620:
URL: https://github.com/apache/arrow-rs/issues/3620#issuecomment-1406991366


   Thanks @tustvold ! 
   
   Yes, I think it's a good idea to start with a PoC in DataFusion only. I'll 
try to see if we can get some good numbers with the approach using some 
synthetic benchmarks :)
   
   One question: how do you detect whether certain code change would break 
SIMD? is there any convenient way of doing that?
   
   I'll take a look at the lazy materialization on Parquet side and see how it 
can interact with this feature.
   
   > I think it would be really cool to support this, but my experience 
fighting LLVM over null masks, the speed of the filter kernels, and the reality 
that a lot of queries end up bottlenecked on sorting or decoding, makes me 
think there may be mileage in the naive approach. I'm not expert on query 
engines though, so happy to defer to others 😄
   
   Agree. My feeling is also that many queries are actually bottlenecked on 
somewhere else like join or aggregation. It just caught my attention while I'm 
looking at DataFusion and `arrow-rs`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] sunchao commented on issue #3620: Implement short-circuiting for filter evaluation

Reply via email to