sunchao commented on issue #3620: URL: https://github.com/apache/arrow-rs/issues/3620#issuecomment-1406991366
Thanks @tustvold ! Yes, I think it's a good idea to start with a PoC in DataFusion only. I'll try to see if we can get some good numbers with the approach using some synthetic benchmarks :) One question: how do you detect whether certain code change would break SIMD? is there any convenient way of doing that? I'll take a look at the lazy materialization on Parquet side and see how it can interact with this feature. > I think it would be really cool to support this, but my experience fighting LLVM over null masks, the speed of the filter kernels, and the reality that a lot of queries end up bottlenecked on sorting or decoding, makes me think there may be mileage in the naive approach. I'm not expert on query engines though, so happy to defer to others 😄 Agree. My feeling is also that many queries are actually bottlenecked on somewhere else like join or aggregation. It just caught my attention while I'm looking at DataFusion and `arrow-rs`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
