I was working on some code today and encountered this scenario here where the performance behavior of data.table surprised me a little. Is this expected? > dt = data.table(a=rnorm(1000000)) > system.time( for(i in 1:100000) j = dt[i, a] ) usuário sistema decorrido 78.064 0.426 78.034 > system.time( for(i in 1:100000) j = dt[i, "a", with=F] ) usuário sistema decorrido 27.814 0.154 27.810
> system.time( for(i in 1:100000) j = dt[["a"]][i] ) usuário sistema decorrido 1.227 0.006 1.225 (sorry about the output in portuguese) Not knowing anything about how data.table is implemented internally, I would have assumed the three syntaxes for accessing the data.table should have similar or at the most a small difference in performance. -- Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I |
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
