tustvold commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1169847007

   Not entirely sure, I can't reproduce it on my local machine, only on a 
remote server where I can't easily run perf because of 
[debian](https://michcioperz.com/post/slow-perf-script/) shenanigans.
   
   My immediate guess would be that it is something to do with string_required 
being a very large column chunk, and so we are seeing the downside of 
separating IO/compute, namely added latency whilst it reads the data to memory, 
but I'm not too certain. Moving the source parquet file to tmpfs doesn't appear 
to help. Additionally string_optional is only half the size and yet does not 
exhibit the same behaviour. More investigation needed :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to