chitralverma opened a new issue, #176:
URL: https://github.com/apache/arrow-rs-object-store/issues/176

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Currently, in the projects that are using `object_store` - datafusion/ 
delta-rs/ pola-rs etc, a `dyn ObjectStore` has to be created manually by 
parsing the provided URL, checking the scheme and providing the options.
   
   It would be great to have this capability directly provided by the crate.
   
   **Describe the solution you'd like**
   My proposal is to standardize this implementation and bring it into this 
crate itself exposed by a simple function like the below as it would make 
things significantly simple for developers using the crate.
   
   ```
   /// Taken a reference from implementation in delta-rs 
   
https://github.com/delta-io/delta-rs/blob/c8371b38fdf22802f0f91b4ddc2a47da6be97c68/rust/src/storage/mod.rs#L85-L100
   
   #[derive(Clone, Debug, Serialize, Deserialize)]
   pub struct StorageOptions(pub HashMap<String, String>);
   
   #[derive(Debug, Clone)]
   pub struct AObjectStore {
       storage: Arc<dyn ObjectStore>,
       location: Url,
       options: StorageOptions,
   }
   
   /// Try creating a new instance of [`AObjectStore`]
   pub fn get_object_store(location: Url, options: impl Into<StorageOptions> + 
Clone) -> Result<AObjectStore> {
       let prefix = Path::from(location.path());
      
       // parse URL to a kind (s3/ aws/ ... )
       let kind = ObjectStoreKind::parse_url(&location)?;
   
       // instantiate object store
       let store = kind.into_impl( .... );
   
       // return
       Ok(Self {
           store,
           location,
           options: options.into(),
       })
   }
   ```
   
   For any new storage backends that may come up in the future, they can be 
added to the `ObjectStoreKind` along with a small implementation in 
`into_impl`. Users of the crate will only have to bump up the crate version. 
   
   **Describe alternatives you've considered**
   Without this, each lib using `object_store` has to implement its own parsing.
   
   
   Examples: 
   - See datafusion registry 
[here](https://github.com/apache/arrow-datafusion/blob/52fa2285b43ad6712e9b8bf6c05b4b8ff93f44f9/datafusion/execution/src/object_store.rs#L186-L217).
   - See delta-rs implementation 
[here](https://github.com/delta-io/delta-rs/blob/c8371b38fdf22802f0f91b4ddc2a47da6be97c68/rust/src/storage/config.rs#L138-L196)
   
   Also without this each time this crate adds a new backend, the users of this 
crate will have to bump up the version and add implementation for the backends 
by themselves. 
   
   **Additional context**
   This idea is also implemented by,
   
   - PyArrow FileSystem API 
[fs.FileSystem.from_uri(uri)](https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileSystem.html#pyarrow.fs.FileSystem.from_uri)
   - Hadoop FileSystem API [org.apache.hadoop.fs.FileSystem.get(uri, 
conf)](https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/FileSystem.html#get-java.net.URI-org.apache.hadoop.conf.Configuration-)
   - Fsspec [fsspec.filesystem(protocol, 
**storage_options)](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.filesystem)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to