Here's a nice benchmark that's just been posted on S.O. showing set() speedup when looped :
http://stackoverflow.com/a/16797392/403310 On 28.05.2013 19:11, Matthew Dowle wrote: > Hi, > > Yes this is expected because `[.data.table` is a function call with associated overhead. You don't want to loop calls to it. Consider all the arguments to `[.data.table` and all the checks that must be done for existence and type of arguments on each call. The idea is to give [.data.table meaty calls which it can chew on. It doesn't like tiny tasks one at a time. > > `[[` on the other hand is an R primitive. It's part of the language. You can do very limited things with `[[` but in this case (looking up a single column by name or position) in a loop, that's best for the job. I use `[[` on data.table quite a lot. > > This is also the very reason for set()'s existence: ?set says it's a 'loopable :=' because of the `[.data.table` overhead. > > There's a feature request to detect when [.data.table is being looped, though : > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2028&group_id=240&atid=978 > > which would be more helpful of data.table, so at least it told you, rather than having to stumble across it. > > Hope that helps, > > Matthew > > On 28.05.2013 18:37, Alexandre Sieira wrote: > >> I was working on some code today and encountered this scenario here where the performance behavior of data.table surprised me a little. Is this expected? >> >>> dt = data.table(a=rnorm(1000000)) >> >>> system.time( for(i in 1:100000) j = dt[i, a] ) >> >> usuário sistema decorrido >> >> 78.064 0.426 78.034 >> >>> system.time( for(i in 1:100000) j = dt[i, "a", with=F] ) >> >> usuário sistema decorrido >> >> 27.814 0.154 27.810 >> >>> system.time( for(i in 1:100000) j = dt[["a"]][i] ) >> >> usuário sistema decorrido >> >> 1.227 0.006 1.225 >> (sorry about the output in portuguese) >> Not knowing anything about how data.table is implemented internally, I would have assumed the three syntaxes for accessing the data.table should have similar or at the most a small difference in performance. >> >> -- >> Alexandre Sieira >> CISA, CISSP, ISO 27001 Lead Auditor >> >> "The truth is rarely pure and never simple." >> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
