LantaoJin opened a new pull request, #57:
URL: https://github.com/apache/datafusion-java/pull/57
## Which issue does this PR close?
- Closes #56 .
## Rationale for this change
`ScalarFunction.evaluate(BufferAllocator, List<FieldVector>)` (introduced in
#46) is the contract every Java-implemented scalar UDF must satisfy. It must
return a `FieldVector` whose `getValueCount()` matches the batch row count
DataFusion is driving through the operator tree.
For UDFs with at least one argument, the body can read
`args.get(0).getValueCount()` to learn how many rows to produce. For
**nullary** UDFs -- zero arguments, e.g. analogs of `random()`, `pi()`, `now()`
-- `args` is the empty list, and the body has no other channel to learn the row
count.
The native side already knows the value: `ScalarFunctionArgs::number_rows`
is read at `native/src/udf.rs:100`, used to materialise scalar arg columns at
`:106`. The Java bridge (`JniBridge.invokeScalarUdf`) receives it but only uses
it after the fact, to *validate* the returned vector's length. It is never
communicated to `impl.evaluate(...)`.
The result: any nullary UDF that DataFusion does not constant-fold (anything
declared `Volatility.VOLATILE`, or `STABLE` calls in plans the optimizer cannot
fold) trips the post-hoc row-count validation as soon as it runs over a batch
with more than one row.
## What changes are included in this PR?
- `ScalarFunction.evaluate(BufferAllocator allocator, List<FieldVector>
args, int rowCount)` — adds a third parameter carrying the per-batch row count.
Source-breaking signature change to a public interface. The repo is
pre-release; only five existing implementations needed an unused-parameter
update (four test UDFs in `ScalarUdfTest`, one in `examples/AddOneExample`).
- `JniBridge.invokeScalarUdf`
(`core/src/main/java/org/apache/datafusion/internal/JniBridge.java`) now
forwards the existing `expectedRowCount` parameter into `impl.evaluate(...)`.
Post-call validation against the same value is unchanged.
- No native-side change. The value was already on the wire.
## Are these changes tested?
yes
## Are there any user-facing changes?
Yes, a source-breaking signature change to `ScalarFunction.evaluate`.
Implementations of the interface need to add an `int rowCount` parameter to
their `evaluate` override. Bodies that ignore it remain identical otherwise.
Before:
```java
public FieldVector evaluate(BufferAllocator allocator, List<FieldVector>
args) {
// ...
}
```
After:
```java
public FieldVector evaluate(BufferAllocator allocator, List<FieldVector>
args, int rowCount) {
// ...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]