[ 
https://issues.apache.org/jira/browse/SPARK-53947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuchuan Huang updated SPARK-53947:
----------------------------------
    Description: Spark uses FrequentItemsSketch of Apache DataSketches in the 
approx_top_k function, which does not consider NULL values by itself 
([https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/frequencies/FrequentItemsSketch.java#L587).]
 However, NULL value could be meaningful in some use cases and users might want 
to include NULL in the approx_top_k output. Therefore, this ticket aims to add 
a nullCounter associated with the FrequentItemsSketch to count for NULL in the 
approx_top_k aggregation.   (was: Spark uses FrequentItemsSketch of Apache 
DataSketches in the `approx_top_k` function, which does not consider NULL 
values by itself 
([https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/frequencies/FrequentItemsSketch.java#L587).]
 However, NULL value could be meaningful in some use cases and users might want 
to include NULL in the `approx_top_k` output. Therefore, this ticket aims to 
add a nullCounter associated with the FrequentItemsSketch in the `approx_top_k` 
aggregation. )

> Let approx_top_k handle NULLs
> -----------------------------
>
>                 Key: SPARK-53947
>                 URL: https://issues.apache.org/jira/browse/SPARK-53947
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Yuchuan Huang
>            Priority: Critical
>
> Spark uses FrequentItemsSketch of Apache DataSketches in the approx_top_k 
> function, which does not consider NULL values by itself 
> ([https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/frequencies/FrequentItemsSketch.java#L587).]
>  However, NULL value could be meaningful in some use cases and users might 
> want to include NULL in the approx_top_k output. Therefore, this ticket aims 
> to add a nullCounter associated with the FrequentItemsSketch to count for 
> NULL in the approx_top_k aggregation. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to