Hello,

I have been learning to use data.table and studying the vignette located here...

https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html

Section 2f. shows how to subset a data.table to select an arbitrary number of 
rows in each .SD.  That's really handy.

2. Aggregations
  f. Subset .SD for each group:    ans <- flights[, head(.SD, 2), by=month]

In a similar way, I can get the last row of the .SD using either tail, nrow or 
dim (I don't think it matters much, but dim seems to be a faster*).

  ans <- flights[,.SD[dim(.SD)[1]], by=month]

I got to wondering if the number of rows in .SD might be exposed in each 
grouping iteration.  Is there an equivalent to .N for the subset data.table, 
.SD?  Something like .SDN or the like?   

Thanks for data.table!

Ben

* After reading this discussion 
http://r.789695.n4.nabble.com/What-is-the-fastest-way-to-determine-that-data-table-is-empty-td4638348.html#a4638451
 I tried out a couple of methods for getting the last element of a grouping 
using nrow(), tail() and dim().

# using tail
> microbenchmark( last1 <- flights[, tail(.SD, 1), by=month] )
Unit: milliseconds
                                         expr      min       lq     mean   
median       uq      max neval
 last1 <- flights[, tail(.SD, 1), by = month] 16.65898 16.89704 18.26415 
17.37007 19.20147 40.12966   100

# using dim
>   microbenchmark( last2 <- flights[,.SD[dim(.SD)[1]], by=month] )
Unit: milliseconds
                                             expr      min       lq     mean   
median       uq      max neval
 last2 <- flights[, .SD[dim(.SD)[1]], by = month] 15.51243 15.87788 17.40978 
16.19426 17.83308 59.22429   100

# using nrow
>   microbenchmark( last3 <- flights[,.SD[nrow(.SD)], by=month] )
Unit: milliseconds
                                           expr      min       lq     mean   
median       uq      max neval
 last3 <- flights[, .SD[nrow(.SD)], by = month] 15.63919 15.92073 17.28836 
16.52588 18.33867 24.92624   100

>   identical(last1, last2)
[1] TRUE
>   identical(last1, last3)
[1] TRUE

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org








_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to