m09526 opened a new issue, #15964:
URL: https://github.com/apache/datafusion/issues/15964

   ### Describe the bug
   
   When attempting to read a table of files across multiple S3 buckets in a 
single query, the query will fail stating that files can't be found:
   `Object Store error: Object at location test.parquet not found: Client error 
with status 404 Not Found: No Body`
   
   This can be seen when the DataFrame is explained.
   
   ### To Reproduce
   
   With an AWS account, create two buckets and place a test Parquet file in 
each.
   Create an AmazonS3 ObjectStore instance for each file and register them via 
SessionContext::register_object_store with the S3 URIs.
   Create a DataFusion query with SessionContext::read_parquet() that reads 
from both files.
   Attempting to execute the query or show the explained query will show an 
error.
   
   ### Expected behavior
   
   Query should execute and produce same result as if files were in same S3 
bucket.
   
   ### Additional context
   
   Amazon S3 ObjectStore instances are specific to the S3 bucket they read 
from. E.g. S3://bucket1/blah.parquet and S3://bucket2//blah2.parquet require 
two DIFFERENT ObjectStore instances.
   
   The bug occurs in ListingTable which assumes all files in a table can be 
accessed via a single ObjectStore. This can be seen in numerous places such as 
the private list_files_for_scan function and the scan function. ListingTable 
creates the ObjectStore by just querying for the store associated with the 
first files in its list.
   
   The same assumption is present in FileScanConfig
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to