Hi, Welcome. Just to check you've found .SD, [,lapply(.SD,sum),by=...], and .SDcols? .SD consist of all columns other than the grouping columns, which seems similar to what this line is doing? : > mean(summaryDT[,(ncols-length(sList) + 2):ncols, with = FALSE])
Matthew "Erik Iverson" <[email protected]> wrote in message news:cakzgw12zwppt3psqjcdh_smdoqolajruv7cv64uywb8pxk1...@mail.gmail.com... Hello, Thank you for providing the data.table package, I think it will be very useful to me going forward. I have a question about passing around expressions, and have come up with an example to show what I'm after. library(data.table) ## test data N <- 500000 set.seed(100) testData <- data.frame(id = c(sample(1:10000, N, replace = TRUE)), clinic = c(sample(1:10, N, replace = TRUE)), dx = c(sample(1:200, N, replace = TRUE)), rx = c(sample(1:1000, N, replace = TRUE))) ## want to know mean number of dx per ID mean(tapply(testData$dx, testData$id, function(x) length(unique(x)))) ## 44.2212 ## in my real use case, I want to run this with different 'by' ## variables, so let's write a function and try to use data.table, ## call the function uniqueSummary1 uniqueSummary1 <- function(df, key) { DT <- data.table(df) key(DT) <- key summaryDT <- DT[, list(length(unique(dx)), length(unique(rx))), by = key] mean(summaryDT[,list(V1, V2)]) } ## agrees with tapply uniqueSummary1(df = testData, key = c("id")) ## The above works great, but isn't general, since in my real use ## case, I won't know dx and rx are the variables of interest. I want ## to be able to pass them in as arguments. This is exactly what FAQ ## 1.6 is, so let's use that solution to define uniqueSummary2 uniqueSummary2 <- function(df, key, vars) { DT <- data.table(df) key(DT) <- key sList <- substitute(vars) summaryDT <- DT[, eval(sList), by = key] ncols <- ncol(summaryDT) mean(summaryDT[,(ncols-length(sList) + 2):ncols, with = FALSE]) } uniqueSummary2(df = testData, key = c("id"), vars = list(length(unique(dx)), length(unique(rx)), length(unique(clinic)))) ## uniqueSummary2 is better, but relies on me repeating the ## "length(unique())" bit several times. Ideally, I'd just like to ## pass in a list of QUOTED vars to summarize, like the following ## hypothetical call to my yet-unwritten uniqueSummary3 function: uniqueSummary3(df = testData, key = c("id"), vars = c("dx", "rx", "clinic")) I assume I can somehow construct the expression for the j index inside my function, based on the 'vars' character vector, but am stuck on how. Any ideas? Thanks so much, Erik _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
