[GitHub] [arrow-rs] tustvold commented on issue #2230: object_store: replace raw String with Url in Path impl

GitBox Sat, 30 Jul 2022 05:47:29 -0700


tustvold commented on issue #2230:
URL: https://github.com/apache/arrow-rs/issues/2230#issuecomment-1200151568


   Ok, so if I understand the issue correctly:
   
   * You have a catalog service that identifies the files to scan for a query, 
along with some metadata
   * These files are stored in parquet somewhere and can be fetched to memory
   
   This is a very similar problem that IOx has and it historically solved this 
by not using DataFusion's parquet support and using the parquet crate directly, 
in particular it would:
   
   * Fetch files to `Bytes` in memory or a file on disk
   * Use `parquet` crate directly to scan these "files" using a custom 
`ExecutionPlan`
   
   A while back I created some tickets related to making DataFusion's 
abstractions more flexible, but I've not yet had time to finish it up
   
   * https://github.com/apache/arrow-datafusion/issues/2291
   * https://github.com/apache/arrow-datafusion/issues/2293
   
   To me the issue here is that DataFusion's `ParquetExec` is very tightly 
coupled with both where the data is located and how it is fetched. There are 
two solutions to this in my mind:
   
   * Continue the work to make DataFusion's abstractions more usable
   * Accelerate plans to lift the object_store logic from DataFusion into 
`parquet` so that it can be reused by custom ExecutionPlan
   
   What do you think?
   
   FYI @crepererum @alamb 
   
   > What do you think about modifying ObjectStore get operations to replace 
Path with ObjectMeta and adding custom_attributes: Option<Box<[u8]>> to 
ObjectMeta ?
   
   I think this is at odds with the objectives of the crate to provide a 
consistent abstraction across object stores, and so I would be extremely 
reticent to change the API in this way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on issue #2230: object_store: replace raw String with Url in Path impl

Reply via email to