elliot14A opened a new issue, #4269:
URL: https://github.com/apache/arrow-datafusion/issues/4269

   **Describe the bug**
   I was not able to use `infer_schema` function in 
`datafusion::datasource::listing::ListingOptions` with s3 urls and http 
endpoints where it is working fine with path urls.
   
   **To Reproduce**
   ```
   #[tokio::main]
   async fn main() -> anyhow::Result<()> {
       let ctx = SessionContext::new();
       let url = "s3://roapi-test/blogs-flattened.parquet";
       // let url = 
"https://s3.eu-central-1.wasabisys.com/roapi-test/blogs_flattened.parquet";;
       // let url = "./test_data/blogs.parquet";
   
       let options = ListingOptions::new(Arc::new(ParquetFormat::default()));
       let table_url = ListingTableUrl::parse(url)?;
       let s = options.infer_schema(&ctx.state(), &table_url).await?;
       println!("{}", s);
       Ok(())
   }
   ```
   This is the error it returns when run the above code:
   ```
   Error: Internal error: No suitable object store found for 
s3://roapi-test/blogs-flattened.parquet. This was likely caused by a bug in 
DataFusion's code and we would welcome that you file an bug report in our issue 
tracker
   ```
   
   **Expected behavior**
   It should infer the schema of a file on s3 or http just like local files
   
   **Additional context**
   I did some debugging and found out that the actual code which is throwing is 
in the file `datafusion/core/src/datasource/object_store.rs` and in this bit of 
code:
   ```
   pub fn get_by_url(&self, url: impl AsRef<Url>) -> Result<Arc<dyn 
ObjectStore>> {
           let url = url.as_ref();
           // First check whether can get object store from registry
           let s = &url[url::Position::BeforeScheme..url::Position::BeforePath];
           let store = self.object_stores.get(s).map(|o| o.value().clone());
   
           match store {
               Some(store) => Ok(store),
               None => match &self.provider {
                   Some(provider) => {
                       let store = provider.get_by_url(url)?;
                       let key =
                           
&url[url::Position::BeforeScheme..url::Position::BeforePath];
                       self.object_stores.insert(key.to_owned(), store.clone());
                       Ok(store)
                   }
                   None => Err(DataFusionError::Internal(format!(
                       "No suitable object store found for {}",
                       url
                   ))),
               },
           }
       }
   ```
   the `self.object_store` dash map does not contain the s3://bucket_name url 
so it is throwing error. It is mentioned in the comments that it returns s3 
store so how should I register this s3 url 
   
   
   Any Help is appreciated!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to