Re: [datatable-help] Idea/feature request

Andreas Borg Mon, 27 Jun 2011 05:03:36 -0700

For some reason I am not able to install the latest version, so I cannottest it right now. Anyway, it looks great. Thanks!


Andreas


Matthew Dowle schrieb:

Andreas, Steve,

Committed. Please test and confirm if it satisfies all needs ok?

o    A new symbol .BY is available to j, containing 1 row
     of the current 'by' variables, type list. 'by' variables
     may also be used by name and they are now length 1, too.
     This implements FR#1313.
     For example :
          DT[,sum(x)*.BY[[1]],by=y]
          DT[,sum(x)*.BY[[1]],by=eval(byexp)]
          DT[,sapply(.SD,sum)*y,by=y]
          DT[,sapply(.SD,sum)*.BY[[2]],by=list(y,z)]

Matthew



On Wed, 2011-05-11 at 10:24 +0200, Andreas Borg wrote:
Hi Steve,
Now that you've brought this back up, what do you think you would
prefer? For example, using my (admittedly contrived) original example:

result <- some.big.data.table[, by=list(colA, colB), {
 ## Sometimes I want to know what the current values of
 ## colA and colB are in here to get some more info. Mabye
 ## we can have .BY:

 xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
 ## ...
}]

Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)`
more natural, you think?
'J(colA, colB)' is perfect if you know the column names in advance. Thisis not true in my case. I created a minimal example for a possibleapplication for a '.BY' construct:
 > dt <- data.table(x=c(0,1,0,1), y=c(1,0,1,0))
 > dt
     x y
[1,] 0 1
[2,] 1 0
[3,] 0 1
[4,] 1 0
From this table, I want the row sum for each group, i.e. "select x + yfrom dt group by x, y" in SQL. This would be:
 > setkey(dt, x, y)
 > dt[,sum(x[1], y[1]), by=list(x,y)]
     x y V1
[1,] 0 1  1
[2,] 1 0  1
But what if dt can have an arbitrary number of (grouping) columns witharbitrary names? If the grouping columns are given as
groupCols <- c("x", "y")

, the following is possible:
> expr <- parse(text = sprintf("sum(%s)", paste(groupCols, "[1]",sep="", collapse=", ")))
 > dt[,eval(expr), by=groupCols]
     x y V1
[1,] 0 1  1
[2,] 1 0  1

Now, this is certainly uglier than

 > dt[, sum(.BY), by = groupCols]
My actual application is that I apply decision tree models (rpart) to alarge number of binary patterns. In order to save computation time, Iclassify each distinct pattern only once. So what I basically do is togroup by all attributes and apply the model once to each group.
Andreas



--
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: [email protected]

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren 
Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die 
unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Idea/feature request

Reply via email to