I was working on some code today and encountered this scenario here where the performance behavior of data.table surprised me a little. Is this expected?


> dt = data.table(a=rnorm(1000000))


> system.time( for(i in 1:100000) j = dt[i, a] )

  usuário   sistema decorrido 

   78.064     0.426    78.034 


> system.time( for(i in 1:100000) j = dt[i, "a", with=F] )

  usuário   sistema decorrido 

   27.814     0.154    27.810

 

> system.time( for(i in 1:100000) j = dt[["a"]][i] )

  usuário   sistema decorrido 

    1.227     0.006     1.225 


(sorry about the output in portuguese)

Not knowing anything about how data.table is implemented internally, I would have assumed the three syntaxes for accessing the data.table should have similar or at the most a small difference in performance.

-- 
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to