gstvg commented on code in PR #1561:
URL:
https://github.com/apache/datafusion-python/pull/1561#discussion_r3320747831
##########
docs/source/user-guide/common-operations/expressions.rst:
##########
@@ -145,6 +145,52 @@ This function returns a new array with the elements
repeated.
In this example, the `repeated_array` column will contain `[[1, 2, 3], [1, 2,
3]]`.
+Lambda functions
+----------------
+
+Some array functions take a *lambda function*: a small function that runs once
+per element. :py:func:`~datafusion.functions.array_transform` maps a lambda
over
+every element, :py:func:`~datafusion.functions.array_filter` keeps the elements
+for which a predicate lambda is true, and
+:py:func:`~datafusion.functions.array_any_match` returns whether any element
+satisfies a predicate lambda. (Functions that take another function as an
+argument are sometimes called *higher-order* functions.)
+
+The simplest way to supply a lambda is a Python ``lambda``. Its parameter names
+become the lambda parameters, and its return value becomes the body.
+
+.. ipython:: python
+
+ from datafusion import SessionContext, col
+ from datafusion import functions as f
+
+ ctx = SessionContext()
+ df = ctx.from_pydict({"a": [[1, 2, 3], [4, 5]]})
+ df.select(f.array_transform(col("a"), lambda v: v * 2).alias("doubled"))
+ df.select(f.array_filter(col("a"), lambda v: v > 2).alias("big_only"))
+ df.select(f.array_any_match(col("a"), lambda v: v > 3).alias("has_big"))
+
+If you need explicit control over parameter names, build the lambda with
+:py:func:`~datafusion.functions.lambda_` and reference its parameters with
+:py:func:`~datafusion.functions.lambda_var`. The following is equivalent to the
+``array_transform`` call above.
+
+.. ipython:: python
+
+ from datafusion import lit
+
+ double_fn = f.lambda_(["v"], f.lambda_var("v") * lit(2))
+ df.select(f.array_transform(col("a"), double_fn).alias("doubled"))
+
+.. note::
+
+ Lambda expressions cannot yet be serialized: calling
+ :py:meth:`~datafusion.expr.Expr.to_bytes` or pickling an expression that
+ contains a lambda raises ``Lambda not implemented``. SQL lambda syntax
+ (``x -> x * 2``) is only parsed by dialects that support lambdas; set
+ ``datafusion.sql_parser.dialect`` to ``DuckDB`` to use it. The Python
Review Comment:
Sure, I see now that duckdb is the only dialect in PyDialect that supports
lambdas. Just clarifying my comment, the new syntax is already stabilized and
supported in duckdb and sqlparser-rs, alongside the old arrow syntax.
duckdb v2.1 will only remove lambda support for the old syntax, and likely
use it for json operators. My main point was to also document syntax+dialect
pairs that will not break in the future, since sqlparser-rs may follow the
duckdb new syntax when it get's released, so it can correcly parse json
operators.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]