andygrove opened a new pull request, #65:
URL: https://github.com/apache/datafusion-java/pull/65
## Which issue does this PR close?
- Closes #63.
## Rationale for this change
Java users have no way to expose custom in-process tables (JDBC scans,
in-memory
collections, custom file formats, etc.) to DataFusion. This adds a minimal
`DataSource` interface and the JNI wiring to register it on a
`SessionContext`.
The implementation mirrors the existing scalar-UDF JNI pattern.
## What changes are included in this PR?
- New public `DataSource` interface in `org.apache.datafusion` with
`Schema schema()` and `ArrowReader scan(BufferAllocator)`.
- `SessionContext.registerDataSource(name, source)` registers a Java-backed
table; schema is captured at registration time.
- `JniBridge.invokeDataSourceScan` exports the user's `ArrowReader` through
the Arrow C Data Interface (zero-copy).
- Native: `JavaDataSource: TableProvider` + `JavaScanExec: ExecutionPlan` in
`native/src/data_source.rs`, plus the JNI entry point.
- Shared `jthrowable_to_string` helper lifted into `native/src/jni_util.rs`
so the UDF and data-source paths share Java-exception formatting.
- v1 scope: single partition, no projection or filter pushdown into Java
(DataFusion projects/filters on top), no `deregisterTable`.
Multi-partition,
pushdown, and deregistration are listed as follow-ups in the user guide.
## Are these changes tested?
Yes — eight integration tests in
`core/src/test/java/org/apache/datafusion/DataSourceTest.java`:
- `SELECT *` happy path
- `UNION ALL` over the same registered table (multi-scan)
- Empty stream
- Column projection through DataFusion
- Two registered tables joinable in one query
- Schema-mismatch surfaces a readable error
- `scan()` throwing propagates the Java exception class and message
- `scan()` returning null is rejected with \`IllegalStateException\`
## Are there any user-facing changes?
Yes — new public \`DataSource\` interface and
\`SessionContext.registerDataSource\`
method, plus a new user-guide page at
\`docs/source/user-guide/data-source.md\`
covering the API, contract, threading, errors, and v1 limitations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]