timsaucer opened a new issue, #1572:
URL: https://github.com/apache/datafusion-python/issues/1572

   ## Background
   
   DataFusion 54 introduced first-class higher-order functions (HOFs) that take 
lambdas plus collection arguments and rewrite to a scalar `Expr` at planning 
time. PR #1561 exposed Python lambda syntax and the built-in HOFs 
(``array_transform``, ``array_filter``, ``array_any_match``, etc.) but did not 
expose the registration API for custom HOFs.
   
   ## Upstream signature
   
   ```rust
   pub fn register_higher_order_function(&self, func: Arc<dyn 
HigherOrderFunctionImpl>)
   pub fn deregister_higher_order_function(&self, name: &str) -> Option<Arc<dyn 
HigherOrderFunctionImpl>>
   ```
   
   `HigherOrderFunctionImpl` trait requires `name()`, `args_count()` (min, 
max), and `invoke(&self, args: &[Expr]) -> Result<Expr>`.
   
   ## User value
   
   Lets library authors add lambda-aware operators that are not in DataFusion's 
built-in set. Examples: ``array_window(arr, size, x -> ...)``, 
``array_partition_by(arr, x -> key(x))``, ``array_zip_with(a, b, (x, y) -> x * 
y)``, or domain-specific operators on JSON / graph / geo arrays. The HOF runs 
at logical-plan time, rewriting lambda + args into a standard Expr tree the 
planner optimizes -- not equivalent to a ScalarUDF.
   
   ## Why deferred
   
   Effort estimate is medium-large (~350-550 LOC) and the authoring cost is 
high for end users: the ``invoke`` callback must return a DataFusion ``Expr`` 
tree from Python, which requires familiarity with the Expr grammar. Most 
array-lambda needs are already covered by the built-in HOFs PR #1561 ships. No 
open user requests at the time of audit. Filed for tracking to complete the v54 
HOF surface symmetrically once a concrete use case or extension-library 
ecosystem emerges.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to