LantaoJin opened a new pull request, #57:
URL: https://github.com/apache/datafusion-java/pull/57

   ## Which issue does this PR close?
   
   - Closes #56 .
   
   ## Rationale for this change
   
   `ScalarFunction.evaluate(BufferAllocator, List<FieldVector>)` (introduced in 
#46) is the contract every Java-implemented scalar UDF must satisfy. It must 
return a `FieldVector` whose `getValueCount()` matches the batch row count 
DataFusion is driving through the operator tree.
   
   For UDFs with at least one argument, the body can read 
`args.get(0).getValueCount()` to learn how many rows to produce. For 
**nullary** UDFs -- zero arguments, e.g. analogs of `random()`, `pi()`, `now()` 
-- `args` is the empty list, and the body has no other channel to learn the row 
count.
   
   The native side already knows the value: `ScalarFunctionArgs::number_rows` 
is read at `native/src/udf.rs:100`, used to materialise scalar arg columns at 
`:106`. The Java bridge (`JniBridge.invokeScalarUdf`) receives it but only uses 
it after the fact, to *validate* the returned vector's length. It is never 
communicated to `impl.evaluate(...)`.
   
   The result: any nullary UDF that DataFusion does not constant-fold (anything 
declared `Volatility.VOLATILE`, or `STABLE` calls in plans the optimizer cannot 
fold) trips the post-hoc row-count validation as soon as it runs over a batch 
with more than one row.
   
   ## What changes are included in this PR?
   
   - `ScalarFunction.evaluate(BufferAllocator allocator, List<FieldVector> 
args, int rowCount)` — adds a third parameter carrying the per-batch row count. 
Source-breaking signature change to a public interface. The repo is 
pre-release; only five existing implementations needed an unused-parameter 
update (four test UDFs in `ScalarUdfTest`, one in `examples/AddOneExample`).
   - `JniBridge.invokeScalarUdf` 
(`core/src/main/java/org/apache/datafusion/internal/JniBridge.java`) now 
forwards the existing `expectedRowCount` parameter into `impl.evaluate(...)`. 
Post-call validation against the same value is unchanged.
   - No native-side change. The value was already on the wire.
   
   ## Are these changes tested?
   
   yes
   
   ## Are there any user-facing changes?
   
   Yes, a source-breaking signature change to `ScalarFunction.evaluate`. 
Implementations of the interface need to add an `int rowCount` parameter to 
their `evaluate` override. Bodies that ignore it remain identical otherwise.
   
   Before:
   
   ```java
   public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> 
args) {
     // ...
   }
   ```
   
   After:
   
   ```java
   public FieldVector evaluate(BufferAllocator allocator, List<FieldVector> 
args, int rowCount) {
     // ...
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to