[ 
https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-17691.
----------------------------------
    Resolution: Incomplete

> Add aggregate function to collect list with maximum number of elements
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17691
>                 URL: https://issues.apache.org/jira/browse/SPARK-17691
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Assaf Mendelson
>            Priority: Minor
>              Labels: bulk-closed
>
> One of the aggregate functions we have today is the collect_list function. 
> This is a useful tool to do a "catch all" aggregation which doesn't really 
> fit anywhere else.
> The problem with collect_list is that it is unbounded. I would like to see a 
> means to do a collect_list where we limit the maximum number of elements.
> I would see that the input for this would be the maximum number of elements 
> to use and the method of choosing (pick whatever, pick the top N, pick the 
> bottom B)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to