Thanks, Tom. I've added this to my list of data.table ideas. Joe
-----Original Message----- From: Short, Tom [mailto:[email protected]] Sent: Thursday, January 13, 2011 6:37 PM To: [email protected]; Joseph Voelkel Cc: [email protected] Subject: RE: [datatable-help] Summing over many variables. A new approach; a new problem Matthew fixed you case, but the following workaround may be helpful if you have other types of data stuffed in a data table. Basically, you use one table to index another. Of course, you need to remember to keep them in sync. It doesn't work for a table to index itself in this manner. > dt1 <- dt[,1:2,with=FALSE] > dt2 <- dt[,3, with=FALSE] > # try some operations > dt1[,mean(dt2[index, unlist(A)]),by=index] index V1 [1,] 1 13 [2,] 2 23 [3,] 3 36 - Tom > -----Original Message----- > From: [email protected] > [mailto:[email protected]] > On Behalf Of Matthew Dowle > Sent: Thursday, January 13, 2011 17:21 > To: Joseph Voelkel > Cc: [email protected] > Subject: Re: [datatable-help] Summing over many variables. A > new approach; a new problem > > Which is now implemented and committed. Either > install.packages(...,type="source") from R-Forge on unix/mac, > or wait a day or two for the R-Forge binary if you're on Windows. > Thanks for the nudge on this one. > > > dt = data.table(a=c(1,1,2,3,3),key="a") > > dt$b=list(1:2,1:3,1:4,1:5,1:6) > > dt > a b > [1,] 1 1, 2 > [2,] 1 1, 2, 3 > [3,] 2 1, 2, 3, 4 > [4,] 3 1, 2, 3, 4, 5 > [5,] 3 1, 2, 3, 4, 5, 6 > > dt[,mean(unlist(b)),by=a] > a V1 > [1,] 1 1.800000 > [2,] 2 2.500000 > [3,] 3 3.272727 > > dt[,sapply(b,mean),by=a] > a V1 > [1,] 1 1.5 > [2,] 1 2.0 > [3,] 2 2.5 > [4,] 3 3.0 > [5,] 3 3.5 > > > > > On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote: > > Hi Joseph, > > You've found feature request #1092 'Make 'by' work for > list() columns' : > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&g > > roup_id=240&atid=978 > > > > Notes on the FR have this though : > > Currently type 19 isn't supported in dogroups (both input and > > output). This might be straightforward (with luck) to implement. > > See > > > http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data- > table-like-a-sorted-indexed-data-list-tp2544213p2544213.html > > Note this is related but different to FR#202 since a > list() column > > *is* a vector [is.vector()=TRUE]. > > > > Matthew > > > > On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote: > > > > #create matrix that includes list elements A > > > > > > > > > > > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3 > > > 1:41)))) > > > > > > index var A > > > > > > [1,] 1 101 Integer,5 > > > > > > [2,] 2   ; 102 In class=MsoNormal>[3,] 3 103 Integer,11 > > > > > > > class(mat) > > > > > > [1] "matrix" > > > > > > > # convert to data frame and "fix" the first two entries > > > > > > > (df<-as.data.frame(mat)) > > > > > > index var A > > > > > > 1 1 101 11, 12, 13, 14, 15 > > > > > > 2 2 102 &n bsp;&nbs ; 21, 22, 23, 24, 25 > > > > > > 3 3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 > > > > > > > class(df$index) # because mat is atomic > > > > > > [1] "list" > > > > > > > df$index<-as.integer(df$index) # convert to integer > > > > > > > df$var<-as.integer(df$var) # likewise > > > > > > > # conver to data table > > > > > > > dt<-data.table(df) > > > > > > > setkey(dt,index) > > > > > > > > > > > > > > # try some operations > > > > > > > dt[,A] # works > > > > > > [[1]] > > > > > > [1] 11 12 13 14 15 > > > > > > > > > > > > [[2]]< /p> > > > > > > > > > > > > [[3]] > > > > > > [1] 31 32 33 34 35 36 37 38 39 40 41 > > > > > > > > > > > > > dt[,mean(A)] # Does not work. each row of A is a list > > > > > > [1] NA > > > > > > Warning message: > > > > > > In mean.default(A) : argument is not numeric or logical: > returning > > > NA > > > > > > > dt[,mean(unlist(A))] # But here is an easy fix to make this work > > > > > > [1] 27.42857 > > > > > > > > > > > > > > dt[,mean(var),by=index] # works (of course) > > > > > > index V1 > > > > > > [1,] 1 101 > > > > > > [2,] 2 102 > > > > > > [3, 3 103 > > > > > > > > > > > > > > dt[,mean(unlist(A)),by=index] # does not work! > > > > > > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) : > > > > > > only integer,double,logical and character vectors are > allowed so > > > far. Type 19 would need to be added. > > > > > > > > > > > > > > > > > > > > > > > > > > #### Pure code #### > > > > > > #create matrix that includes list elements A > > > > > > > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3 > > > 1:41)))) > > > > > > class(mat) > > > > > > # convert to data frame and "fix" the first two entries > > > > > > (df<-as.data.frame(mat)) > > > > > > class(df$ind ex) # be /o:p> > > > > > > df$index<-as.integer(df$index) # convert to integer > > > > > > df$var<-as.integer(df$var) # likewise > > > > > > # conver to data table > > > > > > dt<-data.table(df) > > > > > > setkey(dt,index) > > > > > > > > > > > > # try some operations > > > > > > dt[,A] # works > > > > > > dt[,mean(A)] # Does not work. each row of A is a list > > > > > > dt[,mean(unlist(A))] # But here is an easy fix to make this > > > > > > > > > > > > dt[,mean(var),by=index] # works (of course) > > > > > > > > > > > > dt[,mean(unlist(A)),by=index] # does not work! > > > > > > > > > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable > > -help > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d atatable-help > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
