[ 
https://issues.apache.org/jira/browse/SPARK-53947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuchuan Huang updated SPARK-53947:
----------------------------------
    Priority: Major  (was: Critical)

> Let approx_top_k handle NULLs
> -----------------------------
>
>                 Key: SPARK-53947
>                 URL: https://issues.apache.org/jira/browse/SPARK-53947
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Yuchuan Huang
>            Priority: Major
>              Labels: pull-request-available
>
> Spark uses FrequentItemsSketch of Apache DataSketches in the approx_top_k 
> function, which does not consider NULL values by itself 
> ([https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/frequencies/FrequentItemsSketch.java#L587).]
>  However, NULL value could be meaningful in some use cases and users might 
> want to include NULL in the approx_top_k output. Therefore, this ticket aims 
> to add a nullCounter associated with the FrequentItemsSketch to count for 
> NULL in the approx_top_k aggregation. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to