I think I found an unexpected behavior with rbindlist when columns are factors: > dt1 = data.table(a=as.factor(c("a", "a", "a"))) > dt1 a 1: a 2: a 3: a > str(dt1) Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable: $ a: Factor w/ 1 level "a": 1 1 1 - attr(*, ".internal.selfref")=<externalptr> > dt2 = data.table(a=as.factor(c("b", "b", "b"))) > dt2 a 1: b 2: b 3: b > str(dt2) Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable: $ a: Factor w/ 1 level "b": 1 1 1 - attr(*, ".internal.selfref")=<externalptr> If I rbind them, I get the expected value - a table with 6 rows, 3 of which have value "a" and 3 with value "b": > rbind(dt1, dt2) a 1: a 2: a 3: a 4: b 5: b 6: b So if I do rbindlist(list(dt1, dt2)), I would expect to get the exact same result, only faster. Unfortunately, that is not the case: > rbindlist(list(dt1, dt2)) a 1: a 2: a 3: a 4: a 5: a 6: a > str(rbindlist(list(dt1, dt2))) Classes ‘data.table’ and 'data.frame': 6 obs. of 1 variable: $ a: Factor w/ 1 level "a": 1 1 1 1 1 1 - attr(*, ".internal.selfref")=<externalptr> This was executed with R 3.0.1 and data.table 1.8.8 on a Mac OS X 10.8.3. Is this expected behavior? Am I missing something? -- Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I |
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
