Hi, I wanted to point out that I'm in Arun's camp on this one:
On Fri, Nov 8, 2013 at 7:09 AM, Arunkumar Srinivasan <[email protected]> wrote: > In my opinion, the dup-names should be allowed *only* during creation of > data.table, and setting names (using `setnames`, `setattr` or the bad form > `names(dt) <- `). Other than that, *ALL* operations should fail (end up in > error), and that includes subsetting operation. The `setnames` gives the > option for the user to set the names back before writing to a file, should > he choose to keep it at the end. > > I think it's much better this way (strict, but avoids confusion). For > example, in data.frames, doing DF$x (when x occurs twice) implicitly prints > only the first (no warning/error). Also, split(DF$x, DF$x) uses the first > column and so does split(DF, DF$x). As an opinionated footnote: I can acquiesce that since data.frames allow duplicated column names, I *guess* data.table should *allow* them, however as is clear (to me) from this long chain of "possibilities" that one can do, I strongly feel that computing over a data.table w/ duplicated columns is a fundamentally broken idea as it is ambiguous as to what the right behavior should be ... forget about even the (surely fun) book-keeping code required to make it happen. You want to import a table with duplicate names? Fine (we should warn on import if it was `fread` or `as.data.table`d). You want to set some names to duplicates? Fine -- warn there too. Want to do any computation inside the data.table via `j` or as a column in `by`? Throw an error and punt the problem to the user to figure out how they would like to disambiguate the first column named "a" from the 10th one -- I don't think we need another FAQ explaining what "the right" way that this should be done is, and why we picked it. Or if you really want to compute over a data.table with duplicate names, you might be better served by having the table in "long" format -- perhaps that's why there are duplicate column names to begin with (I'm guessing -- I still don't think I would ever want to have duped names on purpose) My two cents, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
