tustvold commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1150132924

   I think this is now ready for review, I've created 
https://github.com/apache/arrow-datafusion/pull/2711 which uses currently 
unreleased functionality in arrow-rs to do byte range fetches to object storage.
   
   This PR does represent a 10-20% performance regression in the parquet SQL 
benchmarks when operating on local files. This largely results from moving from 
spawn_blocking and the corresponding scheduler implications documented in 
https://github.com/apache/arrow-rs/issues/1473. 
   
   However, I am inclined to think this is fine for a couple of reasons:
   
   * The work on the new scheduler, which is currently blocked by this PR, was 
specifically created to address this scheduling disparity
   * The difference becomes inconsequential for any non-trivial queries
   * The ongoing work by @Ted-Jiang will help to reduce the IO costs of parquet
   * I think this lays a solid foundation on which we can iterate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to