[ https://issues.apache.org/jira/browse/SPARK-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
holdenk closed SPARK-10223. --------------------------- Resolution: Won't Fix I don't see this feature being particularly popular, especially since its relatively easy to implement outside of Spark its self. If you disagree feel free to re-open this. > Add takeOrderedByKey function to extract top N records within each group > ------------------------------------------------------------------------ > > Key: SPARK-10223 > URL: https://issues.apache.org/jira/browse/SPARK-10223 > Project: Spark > Issue Type: New Feature > Components: PySpark > Reporter: Ritesh Agrawal > Priority: Minor > > Currently PySpark has takeOrdered function that returns top N records. > However often you want to extract top N records within each group. This can > be easily implemented using combineByKey operation and using fixed size heap > to capture top N within each group. A working solution can be found over > [here](https://ragrawal.wordpress.com/2015/08/25/pyspark-top-n-records-in-each-group/) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org