On Wed, Nov 6, 2013 at 2:50 PM, Arunkumar Srinivasan <[email protected]> wrote: > Eddi, > > 1) We can still allow duplicate names in "fread" and during creation of > data.table with the data.table() command. > 2) There's really no loss of data as we can allow "setnames" to set > duplicate names/unduplicate them (and they anyways have the data as they > load that into R using fread). There's therefore no *real* loss of data. > 3) The point is to decide upon where duplicate names are allowed and where > it should give an error… > > As I said before, I think it's essential to allow duplicate names while > loading a file (and therefore for consistency during creation of data.table > as well). However, all grouping/aggregating/subsetting etc.. where ambiguity > can arise should end in error. At least this is my stance so far. Are we > agreeing on this?
Add "evaluation in `j`" to the things you want to throw an error, and I guess I'm ok w/ Arun's stance, too, since I guess we should stay as close to data.frame as possible (even though I think it's still "wrong" to have duplicate column names in principle). I guess a more clever handling of setnames needs to happen too, as it fails if the target data.table has any duplicate names (I'm assuming this has come up already, but I'm only half-tuned-in to this discussion) I also think that the output of the aggregation example Eddi used earlier should be changed, ie: R> x <- data.table(V1=sample(letters[1:3], 10, rep=TRUE), B=rnorm(10)) R> x[, sum(B), by=V1] V1 V1 1: b -0.8581098 2: a 0.8762710 3: c 1.3274762 Just feels wrong for the `sum`ed column to also be V1, but maybe this is an FR for another day. -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
