Re: [datatable-help] unique.data.frame should create a copy, right?

Matthew Dowle Sat, 28 Sep 2013 00:30:26 -0700

Oh, good point.

How about putting 'by' first in those situations :


> DT = data.table(A=rep(1:3,2),B=1:2)
> unique(by="A",DT)
   A B
1: 1 1
2: 2 2
3: 3 1
> unique(by="B",DT)
   A B
1: 1 1
2: 2 2
>

On 27/09/13 20:09, Ricardo Saporta wrote:

Steve, not to beat a dead horse on the "what to name the newparameter" discussion, but I'm wondering what your/others' thoughtsare on using something other than 'by". Maybe even "uby"


Or perhaps we can have a synonym in the function definition:
   .. function(........ , by=uby, uby)

The reason I bring this up is that as I begin to use this and I amreading over my own code, I realize that it takes a lot of visualparsing to distinguish when the "by" in a complex call belongs to"[.data.table" and when the "by" belongs to "unique.data.table"


Cheers,
Rick

On Tue, Aug 27, 2013 at 1:23 PM, Steve Lianoglou<[email protected]<mailto:[email protected]>> wrote:


    Last update here :-)

    After more hemming and hawing, I've changed the name of the new
    parameter added to duplicated.data.table and unique.data.table from
    `by.columnss` to just `by`, as it (more or less) is the same idea as
    the `by` in dt[x, i,j,by,...]

    Sorry for any inconveniences caused if you've been working off of the
    development version.

    -steve


    On Thu, Aug 15, 2013 at 9:35 PM, Ricardo Saporta
    <[email protected]
    <mailto:[email protected]>> wrote:
    > Steve, great stuff!!
    > thanks for making that happen
    >
    > Rick
    >
    >
    > On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou
    > <[email protected]
    <mailto:[email protected]>> wrote:
    >>
    >> Hi all,
    >>
    >> As I needed this sooner than I had expected, I just committed this
    >> change. It's in svn revision 889.
    >>
    >> I chose 'by.columns' as the parameter names -- seemed to make more
    >> sense to me, and using the short hand interactively saves a letter,
    >> eg: unique(dt, by=c('some', 'columns')) ;-)
    >>
    >> Here's the note from the NEWS file:
    >>
    >> o  "Uniqueness" tests can now specify arbirtray combinations of
    >> columns to use to test for duplicates. `by.columns` parameter
    added to
    >> unique.data.table and duplicated.data.table. This allows the
    user to
    >> test for uniqueness using any combination of columns in the
    >> data.table, where previously the user only had the option to
    use the
    >> keyed columns (if keyed) or all columns (if not). The default
    behavior
    >> sets `by.columns=key(dt)` to maintain backward compatability. See
    >> man/duplicated.Rd and tests 986:991 for more information. Thanks to
    >> Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for
    useful
    >> discussions.
    >>
    >> Should work as advertised assuming my unit tests weren't too
    simplistic.
    >>
    >> Cheers,
    >>
    >> -steve
    >>
    >>
    >>
    >>
    >> On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou
    >> <[email protected]
    <mailto:[email protected]>> wrote:
    >> > Thanks for the suggestions, folks.
    >> >
    >> > Matthew: do you have a preference?
    >> >
    >> > -steve
    >> >
    >> > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta
    >> > <[email protected]
    <mailto:[email protected]>> wrote:
    >> >> Steve,
    >> >>
    >> >> I like your suggestion a lot.  I can see putting column
    specification
    >> >> to
    >> >> good use.
    >> >>
    >> >> As for the argument name, perhaps
    >> >>    'use.columns'
    >> >>
    >> >> And where a value of NULL or FALSE will yield same results as
    >> >> `unique.data.frame`
    >> >>
    >> >>     use.columns=key(x)   # default behavior
    >> >>     use.columns=c("col1name", "col7name")   #etc
    >> >>     use.columns=NULL
    >> >>
    >> >>
    >> >> Thanks as always,
    >> >> Rick
    >> >>
    >> >>
    >> >>
    >> >> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou
    >> >> <[email protected]
    <mailto:[email protected]>> wrote:
    >> >>>
    >> >>> Hi folks,
    >> >>>
    >> >>> I actually want to revisit the fix I made here.
    >> >>>
    >> >>> Instead of having `use.key` in the signature to
    unique.data.table (and
    >> >>> duplicated.data.table) to be:
    >> >>>
    >> >>> function(x,
    >> >>>  incomparables=FALSE,
    >> >>>  tolerance=.Machine$double.eps ^ 0.5,
    >> >>>              use.key=TRUE, ...)
    >> >>>
    >> >>> How about we switch out use.key for a parameter that
    specifies the
    >> >>> column names to use in the uniqueness check, which defaults
    to key(x)
    >> >>> to keep backwards compatibility.
    >> >>>
    >> >>> For argument's sake (like that?), lets call this parameter
    `columns`
    >> >>> (by.columns? with.columns? whatever) so:
    >> >>>
    >> >>> function(x,
    >> >>>  incomparables=FALSE,
    >> >>>  tolerance=.Machine$double.eps ^ 0.5,
    >> >>>              columns=key(x), ...)
    >> >>>
    >> >>> Then:
    >> >>>
    >> >>> (1) leaving it alone is the backward compatibile behavior;
    >> >>> (2) Perhaps setting it to NULL will use all columns, and
    make it
    >> >>> equivalent to unique.data.frame (also the same when x has
    no key); and
    >> >>> (3) setting it to any other combo of columns uses those
    columns as the
    >> >>> uniqueness key and filters the rows (only) out of x
    accordingly.
    >> >>>
    >> >>> What do you folks think? Personally I think this is better
    on all
    >> >>> accounts then just specifying to use the key or not and the
    only
    >> >>> question in my mind is the name of the argument -- happy to
    hear other
    >> >>> world views, however, so don't be shy.
    >> >>>
    >> >>> Thanks,
    >> >>> -steve
    >> >>>
    >> >>> --
    >> >>> Steve Lianoglou
    >> >>> Computational Biologist
    >> >>> Bioinformatics and Computational Biology
    >> >>> Genentech
    >> >>
    >> >>
    >> >
    >> >
    >> >
    >> > --
    >> > Steve Lianoglou
    >> > Computational Biologist
    >> > Bioinformatics and Computational Biology
    >> > Genentech
    >>
    >>
    >>
    >> --
    >> Steve Lianoglou
    >> Computational Biologist
    >> Bioinformatics and Computational Biology
    >> Genentech
    >
    >



    --
    Steve Lianoglou
    Computational Biologist
    Bioinformatics and Computational Biology
    Genentech




_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] unique.data.frame should create a copy, right?

Reply via email to