[GitHub] [arrow-datafusion] tustvold commented on pull request #2677: Switch to object_store crate (#2489)

GitBox Wed, 08 Jun 2022 09:25:17 -0700


tustvold commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1150132924


   I think this is now ready for review, I've created 
https://github.com/apache/arrow-datafusion/pull/2711 which uses currently 
unreleased functionality in arrow-rs to do byte range fetches to object storage.
   
   This PR does represent a 10-20% performance regression in the parquet SQL 
benchmarks when operating on local files. This largely results from moving from 
spawn_blocking and the corresponding scheduler implications documented in 
https://github.com/apache/arrow-rs/issues/1473. 
   
   However, I am inclined to think this is fine for a couple of reasons:
   
   * The work on the new scheduler, which is currently blocked by this PR, was 
specifically created to address this scheduling disparity
   * The difference becomes inconsequential for any non-trivial queries
   * The ongoing work by @Ted-Jiang will help to reduce the IO costs of parquet
   * I think this lays a solid foundation on which we can iterate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on pull request #2677: Switch to object_store crate (#2489)

Reply via email to