Hi Ben, If the “Date” column (which seems to be just month names) is already in order - meaning you just want to pick the last item for each group, then this is fairly straightforward:
I assume Date is of type “character”. Method 1: DT[, .SD[.N], by=Group] # Group Value Date # 1: 1 yyy July # 2: 2 qqqq August Method 2: In this case, .SD is not optimised for speed yet. So, if this is slow, then you can overcome it by using .I in place of .SD as follows: DT[DT[, .I[.N], by=Group]$V1] # Group Value Date # 1: 1 yyy July # 2: 2 qqqq August Instead of subsetting entire data per group (.SD), we get the row number (.I) in DT for each group (in column V1) and then just subset those rows. If the Date column is not necessarily sorted for each group, then we create an extra column: Method 3: DT[, idx := chmatch(Date, month.name)] setkey(DT, Group, idx) # sort by group, idx DT[DT[, .I[.N], by=Group]$V1] # Group Value Date idx # 1: 1 yyy July 7 # 2: 2 qqqq August 8 Or if you use v1.9.3, you can use setorder instead of setkey which allows for ordering in ascending and descending order: Method 4: DT[, idx := chmatch(Date, month.name)] setorder(DT, Group, -idx) # sort by group, and descending order on idx Now we’ll need to pick the first element instead of the .Nth (last) element per group. DT[DT[, .I[1L], by=Group]$V1] # Group Value Date idx # 1: 1 yyy July 7 # 2: 2 qqqq August 8 And alternatively, if you don’t wish to add the extra column, you can use order(.) as follows: Method 5: DT[order(Group, -chmatch(Date, month.name))][, .SD[1L], by=Group] If you want to use .I here, you’ll have to save the first part onto a variable, which essentially means you’ll use up twice the memory of your data set.. So, I’d prefer this least. But just to show all possible ways I could think of. HTH Arun From: Arunkumar Srinivasan [email protected] Reply: Arunkumar Srinivasan [email protected] Date: July 19, 2014 at 12:51:04 AM To: bgoldstein [email protected] Cc: [email protected] [email protected] Subject: Re: [datatable-help] Subsetting By Row Function Hi Ben, If the "Date" column (which seems to be just month names) is already in order - meaning you just want to pick the last item for each group, then this is fairly straightforward: I assume `Date` is of type "character". Method 1: DT[, .SD[.N], by=Group] # Group Value Date # 1: 1 yyy July # 2: 2 qqqq August Method 2: In this case, `.SD` is not optimised for speed yet. So, if this is slow, then you can overcome it by using `.I` in place of `.SD` as follows: DT[DT[, .I[.N], by=Group]$V1] # Group Value Date # 1: 1 yyy July # 2: 2 qqqq August Instead of subsetting entire data per group (.SD), we get the row number (.I) in DT for each group (in column V1) and then just subset those rows. --- If On Sat, Jul 19, 2014 at 12:40 AM, bgoldstein <[email protected]> wrote: I am having trouble defining (and therefore searching) for this problem. I have data like this: Group Value Date 1 xxx June 1 yyy July 2 zzzz May 2 qqqq August etc. I want to subset the 'Value' of each 'Group' by the latest 'Date'. So my output should be: Group Value Date 1 yyy July 2 qqqq August etc. The doBy package has a firstobs() function that works but is quite slow. What would be a data.table way to do this? Thank you, Ben -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-By-Row-Function-tp4694221.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
