[jira] [Created] (SPARK-33196) Expose filtered aggregation API

Erwang Guyomarc'h (Jira) Tue, 20 Oct 2020 10:15:16 -0700

Erwang Guyomarc'h created SPARK-33196:
-----------------------------------------


             Summary: Expose filtered aggregation API
                 Key: SPARK-33196
                 URL: https://issues.apache.org/jira/browse/SPARK-33196
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Erwang Guyomarc'h


Spark currently supports filtered aggregation but does not expose API allowing 
to use them when using the `spark.sql.functions` package.

It is possible to use them when writing directly SQL:
{code:scala}
scala> val df = spark.range(100)
scala> df.registerTempTable("df")
scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id < 
50) from df").show()
+-----------+-------------------------------------------------+ 
|classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))|
+-----------+-------------------------------------------------+
|        100|                                               50|
+-----------+-------------------------------------------------+{code}
These aggregations are especially useful when filtering on overlapping datasets 
(where a pivot would not work):
{code:sql}
SELECT 
 AVG(revenue) FILTER (WHERE age < 25),
 AVG(revenue) FILTER (WHERE age < 35),
 AVG(revenue) FILTER (WHERE age < 45)
FROM people;{code}

I did not find an issue tracking this, hence I am creating this one and I will 
join a PR to illustrate a possible implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-33196) Expose filtered aggregation API

Reply via email to