elliot14A opened a new issue, #4269:
URL: https://github.com/apache/arrow-datafusion/issues/4269
**Describe the bug**
I was not able to use `infer_schema` function in
`datafusion::datasource::listing::ListingOptions` with s3 urls and http
endpoints where it is working fine with path urls.
**To Reproduce**
```
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let ctx = SessionContext::new();
let url = "s3://roapi-test/blogs-flattened.parquet";
// let url =
"https://s3.eu-central-1.wasabisys.com/roapi-test/blogs_flattened.parquet";
// let url = "./test_data/blogs.parquet";
let options = ListingOptions::new(Arc::new(ParquetFormat::default()));
let table_url = ListingTableUrl::parse(url)?;
let s = options.infer_schema(&ctx.state(), &table_url).await?;
println!("{}", s);
Ok(())
}
```
This is the error it returns when run the above code:
```
Error: Internal error: No suitable object store found for
s3://roapi-test/blogs-flattened.parquet. This was likely caused by a bug in
DataFusion's code and we would welcome that you file an bug report in our issue
tracker
```
**Expected behavior**
It should infer the schema of a file on s3 or http just like local files
**Additional context**
I did some debugging and found out that the actual code which is throwing is
in the file `datafusion/core/src/datasource/object_store.rs` and in this bit of
code:
```
pub fn get_by_url(&self, url: impl AsRef<Url>) -> Result<Arc<dyn
ObjectStore>> {
let url = url.as_ref();
// First check whether can get object store from registry
let s = &url[url::Position::BeforeScheme..url::Position::BeforePath];
let store = self.object_stores.get(s).map(|o| o.value().clone());
match store {
Some(store) => Ok(store),
None => match &self.provider {
Some(provider) => {
let store = provider.get_by_url(url)?;
let key =
&url[url::Position::BeforeScheme..url::Position::BeforePath];
self.object_stores.insert(key.to_owned(), store.clone());
Ok(store)
}
None => Err(DataFusionError::Internal(format!(
"No suitable object store found for {}",
url
))),
},
}
}
```
the `self.object_store` dash map does not contain the s3://bucket_name url
so it is throwing error. It is mentioned in the comments that it returns s3
store so how should I register this s3 url
Any Help is appreciated!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]