m09526 opened a new issue, #15964: URL: https://github.com/apache/datafusion/issues/15964
### Describe the bug When attempting to read a table of files across multiple S3 buckets in a single query, the query will fail stating that files can't be found: `Object Store error: Object at location test.parquet not found: Client error with status 404 Not Found: No Body` This can be seen when the DataFrame is explained. ### To Reproduce With an AWS account, create two buckets and place a test Parquet file in each. Create an AmazonS3 ObjectStore instance for each file and register them via SessionContext::register_object_store with the S3 URIs. Create a DataFusion query with SessionContext::read_parquet() that reads from both files. Attempting to execute the query or show the explained query will show an error. ### Expected behavior Query should execute and produce same result as if files were in same S3 bucket. ### Additional context Amazon S3 ObjectStore instances are specific to the S3 bucket they read from. E.g. S3://bucket1/blah.parquet and S3://bucket2//blah2.parquet require two DIFFERENT ObjectStore instances. The bug occurs in ListingTable which assumes all files in a table can be accessed via a single ObjectStore. This can be seen in numerous places such as the private list_files_for_scan function and the scan function. ListingTable creates the ObjectStore by just querying for the store associated with the first files in its list. The same assumption is present in FileScanConfig -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org