And this nice answer by Michael might be of interest too :
http://stackoverflow.com/a/13694673/403310 On 22.03.2013 11:05, Matthew Dowle wrote: > Whilst what Rick and Michael said is very true, I suspect that you've found that setting a key on a *numeric* type column is much slower than setkey on an *integer* column. There was an awful (but correct) benchmark on S.O. recently and that's what I replied, but I can't find it now. All I can think is that the OP deleted the question, which would be a shame. If that OP is watching, and that is what happened, please can they undelete it. > > Also you have a setkey(DT) there, with no columns specified. In that case, it will key all the columns; think key only table. But if you have numeric value columns in there as well, or any non-key columns at all, then that will be wasteful. > > Anyway, in the code you posted, try changing > > as.numeric(aa) > > to > > as.integer(aa) > > and you should see setkey run dramatically faster. Then what Rick and Michael said applies from there. > > Matthew > > On 22.03.2013 04:31, Ricardo Saporta wrote: > >> When you set the key, it sorts the table -- this is part of what allows for the speed. >> This initial sorting is what is slowing down your benchmarks. >> >> While it makes sense to compare the initial sort time if you are trying to get a 'full' comparison, in most practice applications, you will only be setting the key once. >> >> Therefore, if you want to see what sort of speed increases you are actually getting, create your DT's first, then benchmark the specific operations of interest. >> >> Also, searching stackoverflow for [r] data.table and benchmarks will produce several useful results >> >> Cheers >> Rick >> >> On Thursday, March 21, 2013, ekbrown wrote: >> >>> Hello. I'm new to data.table(). I am apparently not setting the keys >>> correctly to get the increase in speed talked about in the vignettes, as I >>> get a (much) quicker time *without* keys set. Take a look at the following >>> benchmarking tests. Any ideas? Thanks. Earl Brown >>> >>> > library("data.table") >>> > library("rbenchmark") >>> > >>> > # generates random data >>> > num.files > num.words > logical.vector > file.names > >>> > # defines functions >>> > benDTNoKey + dt + dt[,sum(V1), by = bb][,V1] >>> + } >>> > >>> > benDTWithKey + dt + setkey(dt) >>> + dt[,sum(V1), by = bb][,V1] >>> + } >>> > >>> > benTapply > >>> > # runs benchmarking >>> > benchmark(benTapply(logical.vector, file.names), >>> > benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector, >>> > file.names), replications = 10, columns = c("test", "replications", >>> > "elapsed")) >>> test replications elapsed >>> 3 benDTNoKey(logical.vector, file.names) 10 *0.753* >>> 2 benDTWithKey(logical.vector, file.names) 10 *4.776* >>> 1 benTapply(logical.vector, file.names) 10 6.218 >>> > >>> > # tests for sameness among results >>> > one > two > three > identical(as.integer(one), as.integer(two)) >>> [1] TRUE >>> > identical(as.integer(two), as.integer(three)) >>> [1] TRUE >>> >>> -- >>> View this message in context: http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html [1] >>> Sent from the datatable-help mailing list archive at Nabble.com. >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [2] >> >> -- >> >> Ricardo Saporta >> Graduate Student, Data Analytics >> Rutgers University, New Jersey >> e: [email protected] [3] Links: ------ [1] http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html [2] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [3] mailto:[email protected]
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
