gstvg commented on code in PR #1561:
URL:
https://github.com/apache/datafusion-python/pull/1561#discussion_r3319711959
##########
docs/source/user-guide/common-operations/expressions.rst:
##########
@@ -145,6 +145,52 @@ This function returns a new array with the elements
repeated.
In this example, the `repeated_array` column will contain `[[1, 2, 3], [1, 2,
3]]`.
+Lambda functions
+----------------
+
+Some array functions take a *lambda function*: a small function that runs once
+per element. :py:func:`~datafusion.functions.array_transform` maps a lambda
over
+every element, :py:func:`~datafusion.functions.array_filter` keeps the elements
+for which a predicate lambda is true, and
+:py:func:`~datafusion.functions.array_any_match` returns whether any element
+satisfies a predicate lambda. (Functions that take another function as an
+argument are sometimes called *higher-order* functions.)
+
+The simplest way to supply a lambda is a Python ``lambda``. Its parameter names
+become the lambda parameters, and its return value becomes the body.
+
+.. ipython:: python
+
+ from datafusion import SessionContext, col
+ from datafusion import functions as f
+
+ ctx = SessionContext()
+ df = ctx.from_pydict({"a": [[1, 2, 3], [4, 5]]})
+ df.select(f.array_transform(col("a"), lambda v: v * 2).alias("doubled"))
+ df.select(f.array_filter(col("a"), lambda v: v > 2).alias("big_only"))
+ df.select(f.array_any_match(col("a"), lambda v: v > 3).alias("has_big"))
+
+If you need explicit control over parameter names, build the lambda with
+:py:func:`~datafusion.functions.lambda_` and reference its parameters with
+:py:func:`~datafusion.functions.lambda_var`. The following is equivalent to the
+``array_transform`` call above.
+
+.. ipython:: python
+
+ from datafusion import lit
+
+ double_fn = f.lambda_(["v"], f.lambda_var("v") * lit(2))
+ df.select(f.array_transform(col("a"), double_fn).alias("doubled"))
+
+.. note::
+
+ Lambda expressions cannot yet be serialized: calling
+ :py:meth:`~datafusion.expr.Expr.to_bytes` or pickling an expression that
+ contains a lambda raises ``Lambda not implemented``. SQL lambda syntax
+ (``x -> x * 2``) is only parsed by dialects that support lambdas; set
+ ``datafusion.sql_parser.dialect`` to ``DuckDB`` to use it. The Python
Review Comment:
DuckDB will remove this syntax in v2.1. Perhaps add other dialects that
support this syntax ([spark, databricks, clickhouse,
snowflake](https://github.com/search?q=repo%3Aapache%2Fdatafusion-sqlparser-rs%20supports_lambda_functions&type=code)),
and/or add the new duckdb syntax (``lambda x: x *2`) ?
[duckdb#17235](https://github.com/duckdb/duckdb/pull/17235)
[duckdb#22682](https://github.com/duckdb/duckdb/pull/22682)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]