gstvg commented on code in PR #1561:
URL: 
https://github.com/apache/datafusion-python/pull/1561#discussion_r3320747831


##########
docs/source/user-guide/common-operations/expressions.rst:
##########
@@ -145,6 +145,52 @@ This function returns a new array with the elements 
repeated.
 
 In this example, the `repeated_array` column will contain `[[1, 2, 3], [1, 2, 
3]]`.
 
+Lambda functions
+----------------
+
+Some array functions take a *lambda function*: a small function that runs once
+per element. :py:func:`~datafusion.functions.array_transform` maps a lambda 
over
+every element, :py:func:`~datafusion.functions.array_filter` keeps the elements
+for which a predicate lambda is true, and
+:py:func:`~datafusion.functions.array_any_match` returns whether any element
+satisfies a predicate lambda. (Functions that take another function as an
+argument are sometimes called *higher-order* functions.)
+
+The simplest way to supply a lambda is a Python ``lambda``. Its parameter names
+become the lambda parameters, and its return value becomes the body.
+
+.. ipython:: python
+
+    from datafusion import SessionContext, col
+    from datafusion import functions as f
+
+    ctx = SessionContext()
+    df = ctx.from_pydict({"a": [[1, 2, 3], [4, 5]]})
+    df.select(f.array_transform(col("a"), lambda v: v * 2).alias("doubled"))
+    df.select(f.array_filter(col("a"), lambda v: v > 2).alias("big_only"))
+    df.select(f.array_any_match(col("a"), lambda v: v > 3).alias("has_big"))
+
+If you need explicit control over parameter names, build the lambda with
+:py:func:`~datafusion.functions.lambda_` and reference its parameters with
+:py:func:`~datafusion.functions.lambda_var`. The following is equivalent to the
+``array_transform`` call above.
+
+.. ipython:: python
+
+    from datafusion import lit
+
+    double_fn = f.lambda_(["v"], f.lambda_var("v") * lit(2))
+    df.select(f.array_transform(col("a"), double_fn).alias("doubled"))
+
+.. note::
+
+    Lambda expressions cannot yet be serialized: calling
+    :py:meth:`~datafusion.expr.Expr.to_bytes` or pickling an expression that
+    contains a lambda raises ``Lambda not implemented``. SQL lambda syntax
+    (``x -> x * 2``) is only parsed by dialects that support lambdas; set
+    ``datafusion.sql_parser.dialect`` to ``DuckDB`` to use it. The Python

Review Comment:
   Sure, I see now that duckdb is the only dialect in PyDialect that supports 
lambdas. Just clarifying my comment, the new syntax is already stabilized and 
supported in duckdb and sqlparser-rs, alongside the old arrow syntax. 
   duckdb v2.1 will only remove lambda support for the old syntax, and likely 
use it for json operators. My main point was to also document syntax+dialect 
pairs that will not break in the future, since sqlparser-rs may follow the 
duckdb new syntax when it get's released, so it can correcly parse json 
operators.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to