Some quick testing seems to show that the cost of joining table A to table B
(on preexisting keys) is proportional to size(A)*log2(size(B)), that is,
each element of table A is looked up (using binary search) in table B.  This
is very efficient when size(A) << size(B), but when they're of similar size,
a linear merge can be done in time size(A)+size(B), which can be several
times faster.  Binary merge has a smooth transition from one regime to the
other, but a simple cross-over is pretty good already.

I would put together a nice script to show this explicitly, but that will
take some time, so I wanted to check whether this is a known issue or if I
need to demonstrate this issue in more detail.  (Or if I've made some stupid
blunder.)

Thanks!

             -s
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to