[
https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-17691.
----------------------------------
Resolution: Incomplete
> Add aggregate function to collect list with maximum number of elements
> ----------------------------------------------------------------------
>
> Key: SPARK-17691
> URL: https://issues.apache.org/jira/browse/SPARK-17691
> Project: Spark
> Issue Type: New Feature
> Reporter: Assaf Mendelson
> Priority: Minor
> Labels: bulk-closed
>
> One of the aggregate functions we have today is the collect_list function.
> This is a useful tool to do a "catch all" aggregation which doesn't really
> fit anywhere else.
> The problem with collect_list is that it is unbounded. I would like to see a
> means to do a collect_list where we limit the maximum number of elements.
> I would see that the input for this would be the maximum number of elements
> to use and the method of choosing (pick whatever, pick the top N, pick the
> bottom B)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]