andygrove opened a new pull request, #13:
URL: https://github.com/apache/datafusion-java/pull/13

   Stacked on #9. Closes #8.
   
   > Note: This PR is stacked on #9 and includes its commits in the diff. The 3 
new commits unique to this PR live at the tip of the branch (the eight commits 
since `proto-build`). Once #9 merges, this PR's diff will narrow automatically.
   
   Adds the wiring to make the generated `datafusion.LogicalPlanNode` classes 
from #9 executable. A JVM caller constructs a `LogicalPlanNode` with the 
generated builders, hands its serialized bytes to 
`SessionContext.fromProto(byte[])`, and gets back a `DataFrame` that streams 
Arrow batches via the existing `DataFrame.collect()` path.
   
   To make plans that reference parquet files practical, this PR also ships 
`SessionContext.tableSchema(String)` (Arrow schema of a registered table, 
transferred via Arrow IPC) and `org.apache.datafusion.proto.SchemaConverter` 
(Arrow Schema ↔ `datafusion_common.Schema` proto), so the caller can populate 
`ListingTableScanNode.schema` without hand-coding it.
   
   ## What's in this PR (on top of #9)
   
   - `native/Cargo.toml`: `datafusion-proto = \"53.1.0\"`, `prost = \"0.14\"` 
(the version `datafusion-proto 53.1.0` requires).
   - `native/src/proto.rs`: two JNI methods — `createDataFrameFromProto` 
(decode + `try_into_logical_plan` + `execute_logical_plan`) and 
`tableSchemaIpc` (writes the schema via `arrow::ipc::StreamWriter` and returns 
the bytes).
   - `SessionContext.fromProto(byte[])` and 
`SessionContext.tableSchema(String)`.
   - `SchemaConverter` — pure Java, supports Bool / signed+unsigned Int 8..64 / 
Float32/64 / Utf8 / Utf8View / LargeUtf8 / Date32 / Decimal128 plus 
field/schema metadata; anything else raises `UnsupportedOperationException` 
with a message naming the type.
   - Tests: `SchemaConverterTest` (3 tests, no DataFusion), 
`SessionContextProtoTest` (smoke test with `Projection(literal 1) over 
EmptyRelation`, `tableSchema` against lineitem, integration test that builds a 
`ListingTableScanNode` and compares its output to identical SQL).
   
   ## Not in this PR
   
   - Physical-plan submission (`PhysicalPlanNode`).
   - Custom `LogicalExtensionCodec` for JVM-defined UDFs.
   - JVM-side fluent plan builder.
   - Nested + temporal type coverage in `SchemaConverter` (raises a clear 
exception until extended).
   
   ## Design note
   
   `datafusion-proto`'s plan deserializer is portable: it reconstructs a fresh 
`TableProvider` (here, `ListingTable`) from the proto's `paths` + format + 
schema. It does NOT look up `tableName` against the SessionContext's registered 
tables — that field is purely a label for query-plan display. The integration 
test happens to call `registerParquet` first only so 
`tableSchema(\"lineitem\")` can fetch the schema; the proto plan itself would 
also execute on a fresh context that never registered `lineitem`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to