alamb commented on code in PR #9150:
URL: https://github.com/apache/arrow-datafusion/pull/9150#discussion_r1482986894


##########
datafusion-cli/src/catalog.rs:
##########
@@ -151,10 +154,35 @@ impl SchemaProvider for DynamicFileSchemaProvider {
         // if the inner schema provider didn't have a table by
         // that name, try to treat it as a listing table
         let state = self.state.upgrade()?.read().clone();
-        let config = 
ListingTableConfig::new(ListingTableUrl::parse(name).ok()?)
+        let table_url = ListingTableUrl::parse(name).ok()?;
+
+        // Assure the `http` store for this url is registered if this
+        // is an `http(s)` listing
+        // TODO: support for other types, e.g. `s3`, may need to be added
+        match table_url.scheme() {
+            "http" | "https" => {
+                let url: &Url = table_url.as_ref();
+                match state.runtime_env().object_store_registry.get_store(url) 
{
+                    Ok(_) => {}
+                    Err(_) => {

Review Comment:
   I agree this is confusing but follows the existing pattern in 
datafusion-cli, which is basically to defer registering the object store 
instance until something actually tries to access it (so, for example, we don't 
get S3 config errors b/c the config isn't setup if the user isn't actually 
querying S3)
   
   As written this code will support http from select, which is great, but it 
doesn't support other url types (like `s3://...` etc. 
   
   Instead of creating an HttpBuilder directly, would it be possible to call
   
   
https://github.com/apache/arrow-datafusion/blob/10ae9343368a893012aa80b66c02d45b4f461f9f/datafusion-cli/src/exec.rs#L305-L339
   
   So all the types of object stores are covered? 
   
   It would be great to add additional comments explaining the deferred 
registration logic, but i would be happy to do that as a follow on PR as well
   



##########
docs/source/user-guide/cli.md:
##########
@@ -194,6 +194,19 @@ DataFusion CLI v16.0.0
 2 rows in set. Query took 0.007 seconds.
 ```
 
+You can also query directly from the remote location via HTTP(S) without

Review Comment:
   🎉 
   
   I left a suggestion where hopefully this could be generalized to "you can 
also query directly from any remote location ..."



##########
datafusion-cli/src/catalog.rs:
##########
@@ -166,3 +194,50 @@ impl SchemaProvider for DynamicFileSchemaProvider {
         self.inner.table_exist(name)
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use datafusion::prelude::SessionContext;
+
+    #[tokio::test]
+    async fn query_http_location_test() -> Result<()> {

Review Comment:
   I agree that this is sufficient coverage --i don't think it would be good to 
require internet access to run the tests successfully either.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to