cfraz89 opened a new pull request, #4788:
URL: https://github.com/apache/arrow-datafusion/pull/4788
- Append trailing slash to table paths if they are directories
# Which issue does this PR close?
Is similar to #4204 - inability to use an object store listing for a table
schema.
However this PR addresses tables generated from `ListingSchemaProvider`.
# Rationale for this change
I would like to set up an object store (s3) where each directory maps to a
single table/schema, with the contents being made up of all files (parquet)
inside the directory. By registering the schema provider like:
```
Arc::new(ListingSchemaProvider::new(
String::from(format!("s3://{bucket_name}")),
"".into(),
Arc::new(ListingTableFactory::default()),
s3.clone(),
String::from("PARQUET"),
false,
));
```
Then if there is a folder in the bucket, such as `userdata`, attempting to
query against `userdata` table causes the s3 client to 404, as the provider
creates ListingTables with paths set to the raw directory names, eg `userdata`,
and in `datafusion/core/src/datasource/listing/url.rs:149`, we have:
```
// If the prefix is a file, use a head request, otherwise list
let is_dir = self.url.as_str().ends_with('/');
```
Since the paths don't end with `/`, it treats the directories as files and
attempts to perform `head` on them instead of listing them.
This PR remedies this scenario, allowing the query to succeed.
# What changes are included in this PR?
ListingSchemaProvider is altered to track whether the table paths it has
listed are directories or files. If they are directories, it creates the
ListingTables with a '/' appended to the stringified table path, allowing the
ListingTable to successfully list its contents.
# Are these changes tested?
Some light unit tests added.
# Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]