Hi Michael,
It happens automatically. See NEWS for v1.9.2 :
o New optimization: GForce. Rather than grouping the data, the group locations
are passed into
grouped versions of sum and mean (gsum and gmean) which then compute the
result for all groups
in a single sequential pass through the column for cache efficiency.
Further, since the g*
function is called just once, we don't need to find ways to speed up
calling sum or mean
repetitively for each group. Plan is to add gmin, gmax, gsd, gprod,
gwhich.min and gwhich.max.
Examples where GForce applies now :
DT[,sum(x,na.rm=),by=...] # yes
DT[,list(sum(x,na.rm=),mean(y,na.rm=)),by=...] # yes
DT[,lapply(.SD,sum,na.rm=),by=...] # yes
DT[,list(sum(x),min(y)),by=...] # no. gmin not yet
available, only sum and mean so far.
GForce is a level 2 optimization. To turn it off:
options(datatable.optimize=1)
Reminder: to see the optimizations and other info, set verbose=TRUE
Matt
On 12/06/14 03:44, Michael Smith wrote:
Hi Matt,
You mention GForce in your slides. Is this something that happens behind
the scenes, or is it something the user should take care of? (I couldn't
find it in the current docs.)
Thanks,
M
On 06/12/2014 04:22 AM, Matt Dowle wrote:
Draft slides are now online for the 3 hour data.table tutorial at useR!
on Monday 30 June.
user2014.stat.ucla.edu/#tutorials
Is there something fundamental that you wished had been explained in a
tutorial like this? If so, please let me know.
I'm doing another of these long tutorials, jointly with Arun, in London
on Monday 15th September :
http://www.earl-conference.com/Speakers/Workshop1_DataTable.html
Matt
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help