LantaoJin opened a new pull request, #73:
URL: https://github.com/apache/datafusion-java/pull/73

   ## Which issue does this PR close?
   
   - Closes #70 .
   
   ## Rationale for this change
   
   `SessionContext.registerParquet(name, "s3://bucket/...")` (and the 
read/register counterparts for CSV / NDJSON / Arrow / Avro) accept arbitrary 
path strings, but until this PR there was no Java surface to attach an 
`object_store::ObjectStore` to a URL scheme + bucket. As a result, `s3://`, 
`gs://`, and remote `https://` paths fail with *"No suitable object store found 
for ..."* unless the embedder shells out to process-level environment variables 
— and even then, multi-tenant JVMs cannot give two contexts different 
credentials in the same process.
   
   DataFusion's Rust `RuntimeEnv::register_object_store(url, store)` already 
solves this end of the problem; the gap was purely in the Java surface above 
the JNI line. This PR closes it with a typed, per-context registration API:
   
   ```java
   SessionContext ctx = SessionContext.builder()
       .registerObjectStore(ObjectStoreOptions.s3()
           .bucket("my-bucket").region("us-east-1")
           .accessKeyId("...").secretAccessKey("...").build())
       .registerObjectStore(ObjectStoreOptions.s3()
           .bucket("other").endpoint("https://minio.internal:9000";)
           .allowHttp(true).build())
       .build();
   
   ctx.registerParquet("orders", "s3://my-bucket/orders/");
   ```
   
   ## What changes are included in this PR?
   
   - **Proto:** new `proto/object_store_options.proto`
   - **Java API:** new `org.apache.datafusion.ObjectStoreOptions`
   - **Native:** new `native/src/object_store.rs`
   - **Cargo features:** `object-store-{aws,gcp,http}`, with `default = 
["object-store-aws", "object-store-gcp", "object-store-http"]`
   - **Build wiring:** `proto/object_store_options.proto`
   
   ## Are these changes tested?
   
   23 new tests across `ObjectStoreOptionsTest` and 
`SessionContextObjectStoreTest`
   
   ## Are there any user-facing changes?
   
   Yes, additive only, no breaking changes:
   
   - New public class `org.apache.datafusion.ObjectStoreOptions` (sealed) with 
three concrete subtypes `S3` / `Gcs` / `Http` and per-backend builders. Static 
factories: `ObjectStoreOptions.s3()`, `.gcs()`, `.http(listingUrl)`.
   - New public method 
`SessionContextBuilder.registerObjectStore(ObjectStoreOptions)`.
   - Generated protobuf classes `ObjectStoreRegistration`, `S3Options`, 
`GcsOptions`, `HttpOptions` under `org.apache.datafusion.protobuf` (consistent 
with existing options bundles).
   - New `repeated ObjectStoreRegistration object_stores = 8` field on 
`SessionOptions` (proto-3 forward compatible — older builds simply ignore it).
   
   Existing APIs are unchanged. The `make` build picks up 
`object-store-{aws,gcp,http}` Cargo features by default; downstream Rust builds 
that strip a feature trip a clear runtime error if a caller registers the 
missing backend.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to