Great! Thanks, Matthew. This should be a big help. (It was also nice to see that others are using lists in as well. Use of lists here--well, in data frames--was an "aha" moment for me about a year ago.)
-----Original Message----- From: Matthew Dowle [mailto:[email protected]] On Behalf Of Matthew Dowle Sent: Thursday, January 13, 2011 5:21 PM To: Joseph Voelkel Cc: [email protected] Subject: Re: [datatable-help] Summing over many variables. A new approach; a new problem Which is now implemented and committed. Either install.packages(...,type="source") from R-Forge on unix/mac, or wait a day or two for the R-Forge binary if you're on Windows. Thanks for the nudge on this one. > dt = data.table(a=c(1,1,2,3,3),key="a") > dt$b=list(1:2,1:3,1:4,1:5,1:6) > dt a b [1,] 1 1, 2 [2,] 1 1, 2, 3 [3,] 2 1, 2, 3, 4 [4,] 3 1, 2, 3, 4, 5 [5,] 3 1, 2, 3, 4, 5, 6 > dt[,mean(unlist(b)),by=a] a V1 [1,] 1 1.800000 [2,] 2 2.500000 [3,] 3 3.272727 > dt[,sapply(b,mean),by=a] a V1 [1,] 1 1.5 [2,] 1 2.0 [3,] 2 2.5 [4,] 3 3.0 [5,] 3 3.5 > On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote: > Hi Joseph, > You've found feature request #1092 'Make 'by' work for list() columns' : > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&group_id=240&atid=978 > > Notes on the FR have this though : > Currently type 19 isn't supported in dogroups (both input and > output). This might be straightforward (with luck) to implement. > See > http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-table-like-a-sorted-indexed-data-list-tp2544213p2544213.html > Note this is related but different to FR#202 since a list() column > *is* a vector [is.vector()=TRUE]. > > Matthew > > On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote: > > > #create matrix that includes list elements A > > > > > > > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41)))) > > > > index var A > > > > [1,] 1 101 Integer,5 > > > > [2,] 2   ; 102 In class=MsoNormal>[3,] 3 103 Integer,11 > > > > > class(mat) > > > > [1] "matrix" > > > > > # convert to data frame and "fix" the first two entries > > > > > (df<-as.data.frame(mat)) > > > > index var A > > > > 1 1 101 11, 12, 13, 14, 15 > > > > 2 2 102 &n bsp;&nbs ; 21, 22, 23, 24, 25 > > > > 3 3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 > > > > > class(df$index) # because mat is atomic > > > > [1] "list" > > > > > df$index<-as.integer(df$index) # convert to integer > > > > > df$var<-as.integer(df$var) # likewise > > > > > # conver to data table > > > > > dt<-data.table(df) > > > > > setkey(dt,index) > > > > > > > > > > # try some operations > > > > > dt[,A] # works > > > > [[1]] > > > > [1] 11 12 13 14 15 > > > > > > > > [[2]]< /p> > > > > > > > > [[3]] > > > > [1] 31 32 33 34 35 36 37 38 39 40 41 > > > > > > > > > dt[,mean(A)] # Does not work. each row of A is a list > > > > [1] NA > > > > Warning message: > > > > In mean.default(A) : argument is not numeric or logical: returning NA > > > > > dt[,mean(unlist(A))] # But here is an easy fix to make this work > > > > [1] 27.42857 > > > > > > > > > > dt[,mean(var),by=index] # works (of course) > > > > index V1 > > > > [1,] 1 101 > > > > [2,] 2 102 > > > > [3, 3 103 > > > > > > > > > > dt[,mean(unlist(A)),by=index] # does not work! > > > > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) : > > > > only integer,double,logical and character vectors are allowed so > > far. Type 19 would need to be added. > > > > > > > > > > > > > > > > > > #### Pure code #### > > > > #create matrix that includes list elements A > > > > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41)))) > > > > class(mat) > > > > # convert to data frame and "fix" the first two entries > > > > (df<-as.data.frame(mat)) > > > > class(df$ind ex) # be /o:p> > > > > df$index<-as.integer(df$index) # convert to integer > > > > df$var<-as.integer(df$var) # likewise > > > > # conver to data table > > > > dt<-data.table(df) > > > > setkey(dt,index) > > > > > > > > # try some operations > > > > dt[,A] # works > > > > dt[,mean(A)] # Does not work. each row of A is a list > > > > dt[,mean(unlist(A))] # But here is an easy fix to make this > > > > > > > > dt[,mean(var),by=index] # works (of course) > > > > > > > > dt[,mean(unlist(A)),by=index] # does not work! > > > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
