Hi Ben,

If the “Date” column (which seems to be just month names) is already in order - 
meaning you just want to pick the last item for each group, then this is fairly 
straightforward:

I assume Date is of type “character”.

Method 1:

DT[, .SD[.N], by=Group]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August
Method 2:

In this case, .SD is not optimised for speed yet. So, if this is slow, then you 
can overcome it by using .I in place of .SD as follows:

DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August
Instead of subsetting entire data per group (.SD), we get the row number (.I) 
in DT for each group (in column V1) and then just subset those rows.

If the Date column is not necessarily sorted for each group, then we create an 
extra column:

Method 3:

DT[, idx := chmatch(Date, month.name)]
setkey(DT, Group, idx) # sort by group, idx
DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date idx
# 1:     1   yyy   July   7
# 2:     2  qqqq August   8
Or if you use v1.9.3, you can use setorder instead of setkey which allows for 
ordering in ascending and descending order:

Method 4:

DT[, idx := chmatch(Date, month.name)]
setorder(DT, Group, -idx) # sort by group, and descending order on idx
Now we’ll need to pick the first element instead of the .Nth (last) element per 
group.

DT[DT[, .I[1L], by=Group]$V1]
#    Group Value   Date idx
# 1:     1   yyy   July   7
# 2:     2  qqqq August   8
And alternatively, if you don’t wish to add the extra column, you can use 
order(.) as follows:

Method 5:

DT[order(Group, -chmatch(Date, month.name))][, .SD[1L], by=Group]
If you want to use .I here, you’ll have to save the first part onto a variable, 
which essentially means you’ll use up twice the memory of your data set.. So, 
I’d prefer this least. But just to show all possible ways I could think of.

HTH


Arun

From: Arunkumar Srinivasan [email protected]
Reply: Arunkumar Srinivasan [email protected]
Date: July 19, 2014 at 12:51:04 AM
To: bgoldstein [email protected]
Cc: [email protected] 
[email protected]
Subject:  Re: [datatable-help] Subsetting By Row Function  

Hi Ben,

If the "Date" column (which seems to be just month names) is already in order - 
meaning you just want to pick the last item for each group, then this is fairly 
straightforward:

I assume `Date` is of type "character".

Method 1:
DT[, .SD[.N], by=Group]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August

Method 2:
In this case, `.SD` is not optimised for speed yet. So, if this is slow, then 
you can overcome it by using `.I` in place of `.SD` as follows:

DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August

Instead of subsetting entire data per group (.SD), we get the row number (.I) 
in DT for each group (in column V1) and then just subset those rows.

---

If 


On Sat, Jul 19, 2014 at 12:40 AM, bgoldstein <[email protected]> wrote:
I am having trouble defining (and therefore searching) for this problem. I
have data like this:

Group Value Date
1         xxx   June
1         yyy   July
2         zzzz   May
2         qqqq  August
etc.


I want to subset the 'Value' of each 'Group' by the latest 'Date'. So my
output should be:

Group Value Date
1         yyy   July
2         qqqq  August
etc.

The doBy package has a firstobs() function that works but is quite slow.

What would be a data.table way to do this?

Thank you,

Ben



--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-By-Row-Function-tp4694221.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to