[ 
https://issues.apache.org/jira/browse/SPARK-53742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-53742:
-----------------------------------
    Labels: pull-request-available  (was: )

> Push down the filter used in the count_if function
> --------------------------------------------------
>
>                 Key: SPARK-53742
>                 URL: https://issues.apache.org/jira/browse/SPARK-53742
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Ji Jun Tang
>            Priority: Minor
>              Labels: pull-request-available
>
> By pushing down the filter condition in the count_if function, we can reduce 
> the volume of data that needs to be processed.
>  
> {code:java}
> // code placeholder
> spark.sql("create table t1(a int, b int, c int) using parquet")
> spark.sql("select count_if(a <>1) from t1").explain("cost") {code}
> Current:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [count(if (NOT _common_expr_0#6) null else _common_expr_0#6) AS 
> count_if((NOT (a = 1)))#4L], Statistics(sizeInBytes=16.0 B, rowCount=1)
> +- Project [NOT (a#0 = 1) AS _common_expr_0#6], Statistics(sizeInBytes=1.0 B)
>    +- Relation spark_catalog.default.t1[a#0,b#1,c#2] parquet, 
> Statistics(sizeInBytes=0.0 B) {code}
> Excepted:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [count(if (NOT _common_expr_2#22) null else _common_expr_2#22) AS 
> count_if((NOT (a = 1)))#21L], Statistics(sizeInBytes=16.0 B, rowCount=1)
> +- Project [NOT (a#3 = 1) AS _common_expr_2#22], Statistics(sizeInBytes=1.0 B)
>    +- Filter (isnotnull(a#3) AND NOT (a#3 = 1)), Statistics(sizeInBytes=1.0 B)
>       +- Relation spark_catalog.default.t1[a#3,b#4,c#5] parquet, 
> Statistics(sizeInBytes=0.0 B) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to