[jira] [Commented] (SPARK-48362) Add CollectSetWIthLimit

Holden Karau (Jira) Mon, 22 Dec 2025 13:04:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-48362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047114#comment-18047114
 ]


Holden Karau commented on SPARK-48362:
--------------------------------------

I'm going to poke at this a little bit.

> Add CollectSetWIthLimit
> -----------------------
>
>                 Key: SPARK-48362
>                 URL: https://issues.apache.org/jira/browse/SPARK-48362
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Holden Karau
>            Priority: Major
>
> See 
> [https://stackoverflow.com/questions/38730912/how-to-limit-functions-collect-set-in-spark-sql]
>  
> Some users want to collect a set but if the number of distinct elements is 
> too large they may get a Cannot grow BufferHolder  error from trying to 
> collect the set then trim it.
>  
> We should offer a collect set which pre-emptively does not add more elements 
> than needed to reduce the amount of memory used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-48362) Add CollectSetWIthLimit

Reply via email to