Erwang Guyomarc'h created SPARK-33196:
-----------------------------------------
Summary: Expose filtered aggregation API
Key: SPARK-33196
URL: https://issues.apache.org/jira/browse/SPARK-33196
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Erwang Guyomarc'h
Spark currently supports filtered aggregation but does not expose API allowing
to use them when using the `spark.sql.functions` package.
It is possible to use them when writing directly SQL:
{code:scala}
scala> val df = spark.range(100)
scala> df.registerTempTable("df")
scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id <
50) from df").show()
+-----------+-------------------------------------------------+
|classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))|
+-----------+-------------------------------------------------+
| 100| 50|
+-----------+-------------------------------------------------+{code}
These aggregations are especially useful when filtering on overlapping datasets
(where a pivot would not work):
{code:sql}
SELECT
AVG(revenue) FILTER (WHERE age < 25),
AVG(revenue) FILTER (WHERE age < 35),
AVG(revenue) FILTER (WHERE age < 45)
FROM people;{code}
I did not find an issue tracking this, hence I am creating this one and I will
join a PR to illustrate a possible implementation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]