Thank you for clarifying this! I was getting stuck on this. Yes the syntax and speedup are very nice indeed. I am dealing with 10 million row tables so it is very useful for me.
Incidentally, I also asked this question on StackOverflow, and updated it based on your reply http://stackoverflow.com/questions/4764434/r-when-using-data-table-how-do-i-get-columns-of-y-when-i-do-xy On Sat, Jan 22, 2011 at 4:48 PM, Matthew Dowle <[email protected]>wrote: > Welcome to the list. > You're right, the FAQ is wrong. > FR#1095 is "Turn back on 'join inherited scope'". > This was a known problem in NEWS at v1.4 and still is. > When the grouping code was moved from R into C in v1.4 that feature > wasn't something that made it into the port. > > Glad you appreciate the neater syntax and yes it should be faster (the > more columns in x and y the faster the speed up could be, over a merge > followed by a query). > > I'll try and take a look soon. > > Matthew > > > On Sat, 2011-01-22 at 10:41 -0500, Prasad Chalasani wrote: > > The Data-table FAQ 1.11 states: > > > > > > "When you write x[y,foo*boo], data.table automatically inspects the j > > expression to see which columns it uses. > > It will only subset, or group, those columns only. Memory is only > > created for the columns the j uses. > > > > Let’s say foo is in x, and boo is in y (along with 20 other columns in > > y). > > > > Isn’t x[y,foo*boo] quicker to program and quicker to run than a merge > > step followed by another subset step ?" > > > > > > Contrary to what it says above, I get an error when I try to access a > > y-column in the "j" argument of x[y,j]. > > > > See the sequence of code below. > > > > > > > x <- data.table( foo = c(1,1,1,2,2,3), a = 1:6, key = 'foo') > > > > > > > y <- data.table( foo = c(1,2), boo = 10:11, key = 'foo') > > > > > > > > # the below works as expected > > > > > x[y] > > > > foo a > > > > [1,] 1 1 > > > > [2,] 2 4 > > > > > > > with( merge(x,y), foo*boo) > > > > [1] 10 10 10 22 22 > > > > > > # I want to acheive the same result as the above using the > > > > # syntactically more compact (and faster?) code below: > > > > > > > x[y, foo * boo ] > > > > Error in eval(expr, envir, enclos) : object 'boo' not found > > > > > > So is the FAQ just wrong, or am I misunderstanding something? > > > > > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
