Thanks for pointing this out, Eduard. You are absolutely right. I just looked at the SVN repository HEAD and saw a new parameter called ‘fill’ was added to .rbind.data.table that would also accomplish something else I added to my function. Very nice! Looking forward to the new release. :)
-- Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I On 3 de dezembro de 2013 at 15:22:48, Eduard Antonyan ([email protected]) wrote: I took a cursory look at your code - the new rbind does everything you want (check use.names and the fill arguments), and you may want to take a look at its code. On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira <[email protected]> wrote: For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that: - checks that the classes of columns with the same name match; - fills in any missing columns with NAs of the appropriate type; - reorders columns for consistency; - calls rbindlist on the results of this preprocessing. The code is here: https://gist.github.com/asieira/7772953 The results would be as follows: > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) a b 1: 1 2 2: 3 4 > smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo"))) a b c d 1: 1 2 NA NA 2: NA NA 3 NA 3: NA NA NA foo > smartrbindlist(list(data.table(a=1L, b=2), list(a=10))) Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10))) smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer] Hope this helps anyone else out there. -- Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I On 3 de dezembro de 2013 at 14:46:08, G See ([email protected]) wrote: I agree. Here is a related thread: http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 Garrett On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira <[email protected]> wrote: > I have come across some behavior in rbindlist that look unexpected to me: > >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > a b > 1: 1 2 > 2: 4 3 > > So it appears to assume (without checking) that all objects have not only > the same column names but also the same column order. So a value assigned > to column ‘a’ in the second object was used for column ‘b’ in the end result > (and vice-versa). > > I know the documentation says rbindlist uses the column types from the first > entry of the list, but I didn’t see any mention to column order or names > anywhere. > > I suggest that column names are matched, even if they are not in the same > order. Perhaps a ‘use.names’ parameter could be used to ask for this > behavior to avoid breaking backwards compatibility. > > Or, at the very least, I suggest the documentation of bindlist be updated to > explicitly mention that the columns will be considered by position only, and > that callers need to ensure the column orders of all objects match exactly. > And that a warning is issued by rbindlist when the column names don’t match. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
