wjones127 opened a new issue, #2489:
URL: https://github.com/apache/arrow-datafusion/issues/2489

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   In another issue @alamb and @tustvold 
[suggested](https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119804996)
 we might want to use the [IOx ObjectStore 
implementation](https://github.com/influxdata/influxdb_iox/blob/main/object_store).
   
   A few nice points I'll mention about the IOx one:
   
    * They have some nice path utilities, including [a CloudPath 
struct](https://github.com/influxdata/influxdb_iox/blob/main/object_store/src/path/cloud.rs).
 That seems nicer than the current one with `&str` paths.
    * Has implementations for S3, GCS, Azure Blob Storage included in the repo. 
There is no HDFS support yet.
    * Has implementations of `put()` for writing. There doesn't seem to be 
streaming write support (multi-part upload).
   
   There are a few differences in the API:
   
   Current API: 
https://github.com/apache/arrow-datafusion/blob/dfdeb42d7d646cffcf3cff26beefcecffc6cbe62/data-access/src/object_store/mod.rs#L77
   
   IOx API: 
https://github.com/influxdata/influxdb_iox/blob/94e9ac610acfb94870154d976f66a4d4111b5668/object_store/src/lib.rs#L74
   
   * The IOx `list()` implementation evaluated prefixes on path segments: 
"Prefixes are evaluated on a path segment basis, i.e. `foo/bar/` is a prefix of 
`foo/bar/x` but not of `foo/bar_baz/x`."
   * IOx doesn't have a synchronous read implementation.
   
   There of course exist other repos that this has implications for:
   
   * https://github.com/datafusion-contrib/datafusion-objectstore-s3
   * https://github.com/datafusion-contrib/datafusion-objectstore-hdfs
   * https://github.com/datafusion-contrib/datafusion-objectstore-azure
   
   
   From what I've seen, it seems like we could reasonably shift to simply use 
the IOx ObjectStore. But if there's a good reason, we could also reuse useful 
parts of the implementation to keep the existing API.
   
   cc @matthewmturner @kyotoYaho @roeap 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to