tustvold commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170267958
So here is where we stand with regards to this PR: ## Pros * Less range requests will be made to object storage, reducing latency and monetary costs * Threads will not be blocked on network IO * Does not make use of futures::block_on or tokio::spawn_blocking * Will integrate well with future work to reduce bytes fetched from object storage - https://github.com/apache/arrow-rs/issues/1705 * Fits with the longer-term vision of morsel-driven IO within DataFusion - #2504 ## Cons * Slightly higher memory usage for some queries as buffers encoded column chunks instead of reading pages on-demand * Queries to **local** files with column chunks containing large numbers of pages may be slower ## Conclusion I therefore think on-balance this PR represents a step forward, with the only regression mitigated by using smaller row groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org