[PR] feat(udf)!: switch ScalarFunction.evaluate to ColumnarValue API (closes #62) [datafusion-java]

via GitHub Mon, 18 May 2026 07:48:27 -0700


andygrove opened a new pull request, #64:
URL: https://github.com/apache/datafusion-java/pull/64


   ## Which issue does this PR close?
   
   Closes #62.
   
   ## Rationale for this change
   
   DataFusion's Rust `ScalarUDFImpl::invoke_with_args` speaks `ColumnarValue` 
(`Array` or `Scalar`) rather than raw Arrow arrays. The Java binding previously 
materialised every scalar arg to a length-N array before crossing the JNI 
boundary, which lost the scalar-vs-array distinction and forced nullary UDFs to 
learn the batch row count by some out-of-band channel (the workaround proposed 
in PR #57).
   
   Aligning the Java API with the Rust enum eliminates the workaround: a 
nullary UDF can return `ColumnarValue.scalar(...)` and the framework broadcasts 
it, and a UDF that takes literals sees them as Scalars without per-row 
duplication.
   
   ## What changes are included in this PR?
   
   - New `ColumnarValue` sealed interface (`Array`/`Scalar` records, factory 
enforcing length-1 invariant on scalars).
   - New `ScalarFunctionArgs` record bundling `List<ColumnarValue>` and 
`rowCount`.
   - `ScalarFunction.evaluate` is now `evaluate(BufferAllocator, 
ScalarFunctionArgs) -> ColumnarValue` (source-breaking).
   - `JniBridge.invokeScalarUdf` rewritten to ship two struct arrays (length-N 
Array args + length-1 Scalar args) plus a `byte[] argKinds` positional mask, 
returning a `byte` indicating the result variant. JNI signature is now 
`(Lorg/apache/datafusion/ScalarFunction;JJJJ[BJJI)B`.
   - Native `invoke_with_args` no longer materialises scalars; it partitions 
args by `ColumnarValue` variant and reconstructs the result from the returned 
kind byte via `ScalarValue::try_from_array`.
   - `AddOneExample` and `docs/source/user-guide/scalar-udf.md` updated; new 
"Returning a Scalar" section added to the user guide.
   
   ## How are these changes tested?
   
   `make test` — 135 tests pass (12 pre-existing skips). Existing 
`ScalarUdfTest` cases (`AddOne`, `Concat`, `Square`, error paths, volatility 
round-trip) adapted to the new signature, plus three new tests:
   
   - `nullaryScalarReturnUdf_overMultiRowQuery_broadcasts` — a nullary 
`java_pi` returns `ColumnarValue.scalar(...)` and the framework expands it 
across rows, replacing the rowCount workaround.
   - `scalarLiteralArg_arrivesAsScalarColumnarValue` — UDF asserts that a SQL 
literal arrives as `ColumnarValue.Scalar` (length 1), proving scalar-ness 
survives the FFI.
   - `udfReturningScalar_isBroadcastByFramework` — explicit scalar-return path 
test.
   
   Also covered by `cargo clippy --all-targets --workspace -- -D warnings` 
(clean) and `./mvnw spotless:check` (clean).
   
   ## Are there any user-facing changes?
   
   Yes — source-breaking signature change to `ScalarFunction.evaluate`. 
Implementations must:
   
   Before:
   ```java
   public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> 
args) {
       IntVector in = (IntVector) args.get(0);
       // ...
       return out;
   }
   ```
   
   After:
   ```java
   public ColumnarValue evaluate(BufferAllocator allocator, ScalarFunctionArgs 
args) {
       IntVector in = (IntVector) args.args().get(0).vector();
       // ...
       return ColumnarValue.array(out);
   }
   ```
   
   Nullary or broadcast-style UDFs can return `ColumnarValue.scalar(...)` over 
a length-1 vector.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat(udf)!: switch ScalarFunction.evaluate to ColumnarValue API (closes #62) [datafusion-java]

Reply via email to