Re: [datatable-help] Unexpected behavior in setnames()

aragorn168b Wed, 06 Nov 2013 08:10:54 -0800

Eddi,  
Nice! But what exactly will happen to that data, if we were to automatically 
set unique names while loading it (using “freed”) (and issue a warning)??


Arun


On Wednesday 6 November 2013 at 17:05, Eduard Antonyan wrote:

> Last comment here has an example of using duplicated names - 
> http://stackoverflow.com/a/19809942/817778 - it's very similar to the one I 
> mentioned earlier.
>  
>  
> On Mon, Nov 4, 2013 at 3:54 AM, Chinmay Patil <[email protected] 
> (mailto:[email protected])> wrote:
> > FWIW, data.frame does allow duplicate names as well. In the light that 
> > data.table inherits from data.frame, I would expect that it follows same 
> > convention as data.frame.  
> >  
> >  
> > On Sun, Nov 3, 2013 at 9:43 AM, Eduard Antonyan <[email protected] 
> > (mailto:[email protected])> wrote:
> > > @Arun: Ok. Thinking about it a bit - I don't like the continuing 
> > > enumeration solution because it makes the results too unpredictable, but 
> > > could live with adding a ".1" etc. Which I assume is the idea anyway for 
> > > resolving duplicates elsewhere.
> > >  
> > > @Steve: Not sure why you think it doesn't hold much water - I think I can 
> > > draw a parallel argument that replicates all of the duplicated names 
> > > concerns with a column that is called e.g. `dt$V1` (imagine forgetting 
> > > the backticks there and the world of hurt that potentially awaits once 
> > > you do that). I am also curious what Matthew would think about this. This 
> > > is smth I've encountered and dealt with a lot, so I'm certainly not an 
> > > unbiased party here.
> > >  
> > >  
> > > On Sat, Nov 2, 2013 at 8:15 PM, Steve Lianoglou <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > > On Sat, Nov 2, 2013 at 5:43 PM, Eduard Antonyan
> > > > <[email protected] (mailto:[email protected])> wrote:
> > > > > Tbh I don't see why data presentation and preservation (i.e. if you're
> > > > > reading in data with duplicated columns) is not enough of a use case -
> > > > > that's the only reason we allow arbitrary symbols in column names.
> > > > >
> > > > > So, instead of giving you another use case, how about you tell me 
> > > > > instead
> > > > > what do you propose should happen here (instead of what happens now):
> > > > >
> > > > >> dt = data.table(1, 2)
> > > > >> dt
> > > > >    V1 V2
> > > > > 1:  1  2
> > > > >> dt[, sum(V2), by = V1]
> > > > >    V1 V1
> > > > > 1:  1  2
> > > >  
> > > > Only Matthew could say for sure, but if I were a gambling man I'd bet
> > > > that this was likely something that slipped through the cracks and
> > > > sleeping dogs were left to lie. I'd be curious to see what his
> > > > opinions on this are.
> > > >  
> > > > IMHO the "data presentation" argument doesn't really hold much water.
> > > >  
> > > > As for "data preservation," I rather see it as imposing structure on
> > > > it to enable efficient -- and sane/unambigous -- computation over it.
> > > > Further, I don't think is a preservation issue at all -- no data is
> > > > lost. The original data is still there in the file that was loaded
> > > > into R. The name of a column is changed when imported (with adequate
> > > > warning) into a data.table so that the user can slice and dice it. I'd
> > > > also guess the user being warned by the duplicate names would most
> > > > likely be happy to receive the warning, but the fact that you disagree
> > > > suggests that this isn't an obvious conclusion ;-)
> > > >  
> > > > I'm curious if you would argue for an SQL table to allow duplicate
> > > > column names for the same reasons? I do know you can torture SQL to
> > > > get two colnames to be the same by aliasing, but this also seems to
> > > > have slipped through as an accident:
> > > >  
> > > > http://www.dcs.warwick.ac.uk/~hugh/TTM/Importance-of-Column-Names.pdf
> > > >  
> > > > (which I found from here):
> > > > http://stackoverflow.com/questions/8797593/is-there-any-use-to-duplicate-column-names-in-a-table
> > > >  
> > > > Perhaps we should email this guy Hugh to see what he thinks about this 
> > > > one :-)
> > > >  
> > > > -steve
> > > >  
> > > > --
> > > > Steve Lianoglou
> > > > Computational Biologist
> > > > Bioinformatics and Computational Biology
> > > > Genentech
> > >  
> > >  
> > > _______________________________________________
> > > datatable-help mailing list
> > > [email protected] 
> > > (mailto:[email protected])
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >  
>  
> _______________________________________________
> datatable-help mailing list
> [email protected] 
> (mailto:[email protected])
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>  
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Unexpected behavior in setnames()

Reply via email to