alamb commented on code in PR #9150:
URL: https://github.com/apache/arrow-datafusion/pull/9150#discussion_r1482986894
##########
datafusion-cli/src/catalog.rs:
##########
@@ -151,10 +154,35 @@ impl SchemaProvider for DynamicFileSchemaProvider {
// if the inner schema provider didn't have a table by
// that name, try to treat it as a listing table
let state = self.state.upgrade()?.read().clone();
- let config =
ListingTableConfig::new(ListingTableUrl::parse(name).ok()?)
+ let table_url = ListingTableUrl::parse(name).ok()?;
+
+ // Assure the `http` store for this url is registered if this
+ // is an `http(s)` listing
+ // TODO: support for other types, e.g. `s3`, may need to be added
+ match table_url.scheme() {
+ "http" | "https" => {
+ let url: &Url = table_url.as_ref();
+ match state.runtime_env().object_store_registry.get_store(url)
{
+ Ok(_) => {}
+ Err(_) => {
Review Comment:
I agree this is confusing but follows the existing pattern in
datafusion-cli, which is basically to defer registering the object store
instance until something actually tries to access it (so, for example, we don't
get S3 config errors b/c the config isn't setup if the user isn't actually
querying S3)
As written this code will support http from select, which is great, but it
doesn't support other url types (like `s3://...` etc.
Instead of creating an HttpBuilder directly, would it be possible to call
https://github.com/apache/arrow-datafusion/blob/10ae9343368a893012aa80b66c02d45b4f461f9f/datafusion-cli/src/exec.rs#L305-L339
So all the types of object stores are covered?
It would be great to add additional comments explaining the deferred
registration logic, but i would be happy to do that as a follow on PR as well
##########
docs/source/user-guide/cli.md:
##########
@@ -194,6 +194,19 @@ DataFusion CLI v16.0.0
2 rows in set. Query took 0.007 seconds.
```
+You can also query directly from the remote location via HTTP(S) without
Review Comment:
🎉
I left a suggestion where hopefully this could be generalized to "you can
also query directly from any remote location ..."
##########
datafusion-cli/src/catalog.rs:
##########
@@ -166,3 +194,50 @@ impl SchemaProvider for DynamicFileSchemaProvider {
self.inner.table_exist(name)
}
}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use datafusion::prelude::SessionContext;
+
+ #[tokio::test]
+ async fn query_http_location_test() -> Result<()> {
Review Comment:
I agree that this is sufficient coverage --i don't think it would be good to
require internet access to run the tests successfully either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]