alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r803127610



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default 
LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's 
registered
+    /// Returns a tuple with the store and the self-described uri of the file 
in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       I think the distinction I am trying to draw is that the current Object 
Store API is mapped by scheme and it would be up to the object store 
implementation to figure out how to handle host/port information
   
   So rather than having one `HDFSObjectStore` instance for `server1:8000` and 
a second `HDFSObjectStore` instance for `server2:8290`, there would be a single 
`HDFSObjectStore` that would need to know how to dispatch appropriately to the 
different server hosts / ports
   
   The same basic pattern holds for file systems (for example, there is a 
single `LocalFileSyetem` instance even though the local file system might have 
different disks mounted to `/data` and `/data2`).
   
   I think also it would hold for S3 and other types of object stores (where 
depending on the region you need to request to a different endpoint)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to