[ 
https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852946#comment-15852946
 ] 

Luke Miner edited comment on SPARK-19428 at 2/5/17 3:45 AM:
------------------------------------------------------------

I did not know of the existence of the {{first}} function for {{GroupedData}}. 
Would be nice to include it in the {{GroupedData}} portion of the api docs. 
Doesn't seem like this deals with the need to sort the {{GroupedData}} first 
though.

[~srowen] I'm not clear on how you could find the nth most recent timestamps by 
group needed to perform the final join. The method I've used in the past is to 
loop through each id, filter by the id, sort the filtered dataframe on 
timestamp, limit by n, and then append each id specific dataframe back 
together. But this is extremely slow.


was (Author: lminer):
I did not know of the existence of the {first}} function for {{GroupedData}}. 
Would be nice to include it in the {{GroupedData}} portion of the api docs. 
Doesn't seem like this deals with the need to sort the {{GroupedData}} first 
though.

[~srowen] I'm not clear on how you could find the nth most recent timestamps by 
group needed to perform the final join. The method I've used in the past is to 
loop through each id, filter by the id, sort the filtered dataframe on 
timestamp, limit by n, and then append each id specific dataframe back 
together. But this is extremely slow.

> Ability to select first row of groupby
> --------------------------------------
>
>                 Key: SPARK-19428
>                 URL: https://issues.apache.org/jira/browse/SPARK-19428
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Luke Miner
>            Priority: Minor
>
> It would be nice to be able to select the first row from {{GroupedData}}. 
> Pandas has something like this:
> {{df.groupby('group').first()}}
> It's especially handy if you can order the group as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to