You mean what would be the problem? Well, if the user fread's that data, then modifies e.g. non-duplicate columns and then tries to write.csv it back - how would the user recover the original names for correctly writing the data back if we renamed the columns?
On Wed, Nov 6, 2013 at 10:10 AM, <[email protected]> wrote: > Eddi, > Nice! But what exactly will happen to that data, if we were to > automatically set unique names while loading it (using “freed”) (and issue > a warning)?? > > Arun > > On Wednesday 6 November 2013 at 17:05, Eduard Antonyan wrote: > > Last comment here has an example of using duplicated names - > http://stackoverflow.com/a/19809942/817778 - it's very similar to the one > I mentioned earlier. > > > On Mon, Nov 4, 2013 at 3:54 AM, Chinmay Patil <[email protected]>wrote: > > FWIW, data.frame does allow duplicate names as well. In the light that > data.table inherits from data.frame, I would expect that it follows same > convention as data.frame. > > > On Sun, Nov 3, 2013 at 9:43 AM, Eduard Antonyan <[email protected] > > wrote: > > @Arun: Ok. Thinking about it a bit - I don't like the continuing > enumeration solution because it makes the results too unpredictable, but > could live with adding a ".1" etc. Which I assume is the idea anyway for > resolving duplicates elsewhere. > > @Steve: Not sure why you think it doesn't hold much water - I think I can > draw a parallel argument that replicates all of the duplicated names > concerns with a column that is called e.g. `dt$V1` (imagine forgetting the > backticks there and the world of hurt that potentially awaits once you do > that). I am also curious what Matthew would think about this. This is smth > I've encountered and dealt with a lot, so I'm certainly not an unbiased > party here. > > > On Sat, Nov 2, 2013 at 8:15 PM, Steve Lianoglou > <[email protected]>wrote: > > On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan > <[email protected]> wrote: > > Tbh I don't see why data presentation and preservation (i.e. if you're > > reading in data with duplicated columns) is not enough of a use case - > > that's the only reason we allow arbitrary symbols in column names. > > > > So, instead of giving you another use case, how about you tell me instead > > what do you propose should happen here (instead of what happens now): > > > >> dt = data.table(1, 2) > >> dt > > V1 V2 > > 1: 1 2 > >> dt[, sum(V2), by = V1] > > V1 V1 > > 1: 1 2 > > Only Matthew could say for sure, but if I were a gambling man I'd bet > that this was likely something that slipped through the cracks and > sleeping dogs were left to lie. I'd be curious to see what his > opinions on this are. > > IMHO the "data presentation" argument doesn't really hold much water. > > As for "data preservation," I rather see it as imposing structure on > it to enable efficient -- and sane/unambigous -- computation over it. > Further, I don't think is a preservation issue at all -- no data is > lost. The original data is still there in the file that was loaded > into R. The name of a column is changed when imported (with adequate > warning) into a data.table so that the user can slice and dice it. I'd > also guess the user being warned by the duplicate names would most > likely be happy to receive the warning, but the fact that you disagree > suggests that this isn't an obvious conclusion ;-) > > I'm curious if you would argue for an SQL table to allow duplicate > column names for the same reasons? I do know you can torture SQL to > get two colnames to be the same by aliasing, but this also seems to > have slipped through as an accident: > > http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf > > (which I found from here): > > http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table > > Perhaps we should email this guy Hugh to see what he thinks about this one > :-) > > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
