tustvold commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170267958

   So here is where we stand with regards to this PR:
   
   ## Pros
   
   * Less range requests will be made to object storage, reducing latency and 
monetary costs
   * Threads will not be blocked on network IO
   * Does not make use of futures::block_on or tokio::spawn_blocking
   * Will integrate well with future work to reduce bytes fetched from object 
storage  - https://github.com/apache/arrow-rs/issues/1705
   * Fits with the longer-term vision of morsel-driven IO within DataFusion - 
#2504 
   
   ## Cons
   
   * Slightly higher memory usage for some queries as buffers encoded column 
chunks instead of reading pages on-demand
   * Queries to **local** files with column chunks containing large numbers of 
pages may be slower
   
   ## Conclusion
   
   I therefore think on-balance this PR represents a step forward, with the 
only regression mitigated by using smaller row groups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to