milenkovicm commented on issue #6:
URL: https://github.com/apache/arrow-ballista/issues/6#issuecomment-1234425565
Had a quick look into this issue, and from what I can see, there is not
nothing missing on datafusion side to have this functionality (apart from some
hard work :)).
Team did a great job to add support for object store in datafusion:
```rust
use std::sync::Arc;
use datafusion::{
datasource::listing::{ListingTable, ListingTableConfig, ListingTableUrl},
prelude::SessionContext,
};
use log::info;
use object_store::aws::AmazonS3Builder;
let ctx = SessionContext::new();
let s3 = AmazonS3Builder::new()
.with_region("us-east-1")
.with_bucket_name("testbucket")
.with_access_key_id("MINIO")
.with_secret_access_key("MINIO/MINIO")
.with_endpoint("http://localhost:9000")
.with_allow_http(true)
.build()
.unwrap();
let s3 = Arc::new(s3);
ctx.runtime_env()
.register_object_store("s3", "localhost:9000", s3);
let url =
ListingTableUrl::parse("s3://localhost:9000/testpath/").unwrap();
let config = ListingTableConfig::new(url)
.infer(&ctx.state())
.await
.unwrap();
let table = ListingTable::try_new(config).unwrap();
ctx.register_table("test", Arc::new(table)).unwrap();
ctx.sql("SELECT * FROM test")
.await
.unwrap()
.show()
.await
.unwrap();
```
I give quick try with ballista `standalone`, changing code a bit to expose
`RuntimeEnv` on client, scheduler, and executor and registering store on each
of them manually. At the end, it did produce correct result. Currently getting
to a `RuntimeEnv` is not "walk in a park", few hacks here and there were
needed, but it is not hard to make it easier. It would then be possible load
object store providers from configuration files.
Alternatively `register_object_store` can be provided directly on the
`BallistaContext` and then somehow object store configuration may be magically
serialized and handled on other actors in the system. `AmazonS3Builder` should
probably be modified so it can be serialized.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]