Ritesh Agrawal created SPARK-10223:
--------------------------------------
Summary: Add takeOrderedByKey function to extract top N records
within each group
Key: SPARK-10223
URL: https://issues.apache.org/jira/browse/SPARK-10223
Project: Spark
Issue Type: New Feature
Components: PySpark
Reporter: Ritesh Agrawal
Priority: Minor
Currently PySpark has takeOrdered function that returns top N records. However
often you want to extract top N records within each group. This can be easily
implemented using combineByKey operation and using fixed size heap to capture
top N within each group. A working solution can be found over
[here](https://ragrawal.wordpress.com/2015/08/25/pyspark-top-n-records-in-each-group/)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]