Maybe. The slowdown would be fairly significant, perhaps. Although the levels vector is contiguous in memory, the global character hash (the memory where the character pointers point to) isn't. It's not the string cmp as such, it's the page fetches. Also, it might potentially do this check over and over again for the same levels vectors (very wasteful). Remember that [.data.table is recursive in places, although once only I think.
Did you find out what created the out-of-order levels? This check won't help you find out where that occurred, or will it? On Mon, 2011-03-07 at 21:39 -0500, Steve Lianoglou wrote: > On Mon, Mar 7, 2011 at 8:50 PM, Matthew Dowle <[email protected]> wrote: > > Btw : > > > >> a small utility at the C level to scan through the levels() of > >> factor-keys to test for them being in order and > >> breaking/short-circuiting as soon as it finds one level that's out of > >> order? > > > > That's base::is.unsorted(), which is done in C. > > Aww -- was looking forward to writing some C code ... > > It looks like you were right, though -- the problematic data.table has > a (factor) key where `is.unsorted(levels(the_key_column))` is TRUE. > > So I guess we're talking about having something like > options(datatable.check.factor.levels=TRUE) check at the top of the > [.data.table function that fires a warning() when the levels are > unsorted, yeah? > > -steve > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
