tustvold commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1169847007
Not entirely sure, I can't reproduce it on my local machine, only on a remote server where I can't easily run perf because of [debian](https://michcioperz.com/post/slow-perf-script/) shenanigans. My immediate guess would be that it is something to do with string_required being a very large column chunk, and so we are seeing the downside of separating IO/compute, namely added latency whilst it reads the data to memory, but I'm not too certain. Moving the source parquet file to tmpfs doesn't appear to help. Additionally string_optional is only half the size and yet does not exhibit the same behaviour. More investigation needed :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
