Re: [PR] feat: Arrow-direct codegen dispatcher for Spark expressions and Scala UDFs [datafusion-comet]

via GitHub Tue, 12 May 2026 08:01:41 -0700


mbutrovich commented on code in PR #4267:
URL: https://github.com/apache/datafusion-comet/pull/4267#discussion_r3227489004



##########
common/src/main/scala/org/apache/comet/udf/CometUDF.scala:
##########
@@ -27,11 +27,16 @@ import org.apache.arrow.vector.ValueVector
  *
  *   - Vector arguments arrive at the row count of the current batch.
  *   - Scalar (literal-folded) arguments arrive as length-1 vectors and must 
be read at index 0.
- *   - The returned vector's length must match the longest input.
+ *   - The returned vector's length must match `numRows`.
  *
- * Implementations must have a public no-arg constructor and must be 
stateless: a single instance
- * per class is cached and shared across native worker threads for the 
lifetime of the JVM.
+ * `numRows` mirrors DataFusion's `ScalarFunctionArgs.number_rows` and is the 
batch row count.
+ * UDFs that always have at least one batch-length input can read length from 
it and ignore
+ * `numRows`; UDFs that may be called with zero data columns (e.g. a zero-arg 
ScalaUDF through the
+ * codegen dispatcher) need `numRows` to know how many rows to produce.
+ *
+ * Implementations must have a public no-arg constructor and should be 
stateless: instances are
+ * cached per executor thread for the lifetime of the JVM.
  */
 trait CometUDF {
-  def evaluate(inputs: Array[ValueVector]): ValueVector
+  def evaluate(inputs: Array[ValueVector], numRows: Int): ValueVector

Review Comment:
   yes, I am peeling off this and the taskcontext change to its own PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Arrow-direct codegen dispatcher for Spark expressions and Scala UDFs [datafusion-comet]

Reply via email to