Ted-Jiang commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170705539

   > Finally got profiles (by switching the VM to fedora), and it certainly 
fits with my hypothesis above
   > 
   > 
![image](https://user-images.githubusercontent.com/1781103/176465394-b57873aa-6e4c-410b-b202-de43e0b13bff.png)
   > 
   > On the left we have master, and right this branch. The CPU activity under 
`parquet_query_s` demarcates each benchmark iteration, within this you have two 
row groups being read. We can clearly see that with this PR there is a 
noticeable delay as it fetches the bytes into memory before starting decoding 
the data, whereas master interleaves the IO and decoding. There is a trade-off 
here, the approach of master is faster for this particular benchmark, but comes 
at the cost of stalling out worker threads on IO that could have been doing 
other work during decode.
   > 
   > There are some ways we could potentially improve this, e.g. interleaving 
IO at the page instead of column chunk, but this is unlikely to help with 
object storage and may actually perform worse. I'm not sure if this is 
something worth optimising, but would appreciate other people's thoughts
   
   @tustvold Thanks a lot for your sharing 👍.  
   I am not clear about the `whereas master interleaves the IO and decoding` i 
think master use block IO, decode must wait for IO. this patch uses 
interleaving with async function to reduce the blocked IO.
   If i miss something plz tell me 😂


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to