LantaoJin opened a new pull request, #73:
URL: https://github.com/apache/datafusion-java/pull/73
## Which issue does this PR close?
- Closes #70 .
## Rationale for this change
`SessionContext.registerParquet(name, "s3://bucket/...")` (and the
read/register counterparts for CSV / NDJSON / Arrow / Avro) accept arbitrary
path strings, but until this PR there was no Java surface to attach an
`object_store::ObjectStore` to a URL scheme + bucket. As a result, `s3://`,
`gs://`, and remote `https://` paths fail with *"No suitable object store found
for ..."* unless the embedder shells out to process-level environment variables
— and even then, multi-tenant JVMs cannot give two contexts different
credentials in the same process.
DataFusion's Rust `RuntimeEnv::register_object_store(url, store)` already
solves this end of the problem; the gap was purely in the Java surface above
the JNI line. This PR closes it with a typed, per-context registration API:
```java
SessionContext ctx = SessionContext.builder()
.registerObjectStore(ObjectStoreOptions.s3()
.bucket("my-bucket").region("us-east-1")
.accessKeyId("...").secretAccessKey("...").build())
.registerObjectStore(ObjectStoreOptions.s3()
.bucket("other").endpoint("https://minio.internal:9000")
.allowHttp(true).build())
.build();
ctx.registerParquet("orders", "s3://my-bucket/orders/");
```
## What changes are included in this PR?
- **Proto:** new `proto/object_store_options.proto`
- **Java API:** new `org.apache.datafusion.ObjectStoreOptions`
- **Native:** new `native/src/object_store.rs`
- **Cargo features:** `object-store-{aws,gcp,http}`, with `default =
["object-store-aws", "object-store-gcp", "object-store-http"]`
- **Build wiring:** `proto/object_store_options.proto`
## Are these changes tested?
23 new tests across `ObjectStoreOptionsTest` and
`SessionContextObjectStoreTest`
## Are there any user-facing changes?
Yes, additive only, no breaking changes:
- New public class `org.apache.datafusion.ObjectStoreOptions` (sealed) with
three concrete subtypes `S3` / `Gcs` / `Http` and per-backend builders. Static
factories: `ObjectStoreOptions.s3()`, `.gcs()`, `.http(listingUrl)`.
- New public method
`SessionContextBuilder.registerObjectStore(ObjectStoreOptions)`.
- Generated protobuf classes `ObjectStoreRegistration`, `S3Options`,
`GcsOptions`, `HttpOptions` under `org.apache.datafusion.protobuf` (consistent
with existing options bundles).
- New `repeated ObjectStoreRegistration object_stores = 8` field on
`SessionOptions` (proto-3 forward compatible — older builds simply ignore it).
Existing APIs are unchanged. The `make` build picks up
`object-store-{aws,gcp,http}` Cargo features by default; downstream Rust builds
that strip a feature trip a clear runtime error if a caller registers the
missing backend.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]