LantaoJin opened a new pull request, #80: URL: https://github.com/apache/datafusion-java/pull/80
## Which issue does this PR close? - Closes #75 . ## Rationale for this change `SessionContext.fromProto(byte[])` accepts only DataFusion's *own* `LogicalPlanNode` proto. [Substrait](https://substrait.io/) — the cross-engine logical-plan standard that DataFusion already supports through the [`datafusion-substrait`](https://crates.io/crates/datafusion-substrait) crate — has had no Java-side entry point. Embedders that compile plans elsewhere (Calcite via [Isthmus](https://github.com/substrait-io/substrait-java), custom planners, federation hubs, integrations with other engines) had to round-trip through SQL to use the Java binding. That round-trip is lossy: source-side optimisations baked into the Substrait plan are discarded, and SQL is not always expressive enough to round-trip cleanly when plans reference extensions or function variants with no surface SQL form. ## What changes are included in this PR? This PR adds a single new entry point that mirrors the existing `fromProto` shape but consumes Substrait `Plan` bytes instead. The implementation is small (~50 LOC of JNI plus ~25 LOC on the Java side); the bulk of the diff is the test that round-trips a hand-built Substrait plan through the JNI bridge. New public Java API on `SessionContext`: ```java public DataFrame fromSubstrait(byte[] planBytes); ``` `planBytes` is a serialised `substrait.proto.Plan`. The plan is translated against this context's catalog: any tables it references must already be registered. The returned `DataFrame` is lazy and composes with the rest of the API. **Default-off**, so `cargo build` (and therefore `make test`, `make`, and everyone who doesn't need Substrait) stays hermetic without any new build prerequisites. Substrait support is opt-in: | invocation | substrait support | build prereqs | |---|---|---| | `cargo build` (default) | off (stub handler) | none | | `cargo build --features substrait` | on | `protoc` on PATH | | `cargo build --features substrait,protoc` | on (vendored protoc) | `cmake` on PATH | The Java surface is unchanged either way — `SessionContext.fromSubstrait(...)` is always present; calls just throw a clear "datafusion-jni was built without the `substrait` Cargo feature; rebuild with `--features substrait`" error from the JVM if the feature was compiled off. `SessionContextSubstraitTest` detects this case and skips itself via JUnit's `Assumptions.assumeFalse(...)`, so `make test` stays green either way. This is intentionally different from PR #60's avro handling, which is always-on. ## Are these changes tested? Yes, 7 new tests in `SessionContextSubstraitTest` ## Are there any user-facing changes? Yes, purely additive. New public API: - `SessionContext.fromSubstrait(byte[]) → DataFrame` No API removals, no deprecations, no behavior change for existing callers. The default `cargo build` does **not** pull in `datafusion-substrait` and adds no new build prerequisites; `SessionContext.fromSubstrait(...)` is present but throws "feature not enabled" at runtime. Users who need Substrait rebuild with `--features substrait` (and either install `protoc` or also enable the `protoc` helper feature). The native binary is unchanged in size unless the feature is opted in. The new test-scope dependency `io.substrait:core:0.81.0` is added to the parent POM's `dependencyManagement` (with version property `substrait.java.version`) and to `core/pom.xml` in `test` scope only; it does not enter the runtime classpath of the published artifact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
