milenkovicm commented on issue #6:
URL: https://github.com/apache/arrow-ballista/issues/6#issuecomment-1234425565

   Had a quick look into this issue, and from what I can see, there is not 
nothing missing on datafusion side to have this functionality (apart from some 
hard work :)).
   
   Team did a great job to add support for object store in datafusion:
   
   ```rust
   use std::sync::Arc;
   use datafusion::{
       datasource::listing::{ListingTable, ListingTableConfig, ListingTableUrl},
       prelude::SessionContext,
   };
   use log::info;
   use object_store::aws::AmazonS3Builder;
   
       let ctx = SessionContext::new();
   
       let s3 = AmazonS3Builder::new()
           .with_region("us-east-1")
           .with_bucket_name("testbucket")
           .with_access_key_id("MINIO")
           .with_secret_access_key("MINIO/MINIO")
           .with_endpoint("http://localhost:9000";)
           .with_allow_http(true)
           .build()
           .unwrap();
   
       let s3 = Arc::new(s3);
   
       ctx.runtime_env()
           .register_object_store("s3", "localhost:9000", s3);
   
       let url = 
ListingTableUrl::parse("s3://localhost:9000/testpath/").unwrap();
   
       let config = ListingTableConfig::new(url)
           .infer(&ctx.state())
           .await
           .unwrap();
   
       let table = ListingTable::try_new(config).unwrap();
       ctx.register_table("test", Arc::new(table)).unwrap();
   
       ctx.sql("SELECT * FROM test")
           .await
           .unwrap()
           .show()
           .await
           .unwrap();
   ```
   
   I give quick try with ballista `standalone`, changing code a bit to expose 
`RuntimeEnv` on client, scheduler, and executor and registering store on each 
of them manually. At the end, it did produce correct result. Currently getting 
to a `RuntimeEnv` is not "walk in a park", few hacks here and there were 
needed, but it is not hard to make it easier. It would then be possible load 
object store providers from configuration files.
   
   Alternatively `register_object_store` can be provided directly on the 
`BallistaContext` and then somehow object store configuration may be magically 
serialized and handled on other actors in the system. `AmazonS3Builder` should 
probably be modified so it can be serialized.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to