Oh, good point.
How about putting 'by' first in those situations :
> DT = data.table(A=rep(1:3,2),B=1:2)
> unique(by="A",DT)
A B
1: 1 1
2: 2 2
3: 3 1
> unique(by="B",DT)
A B
1: 1 1
2: 2 2
>
On 27/09/13 20:09, Ricardo Saporta wrote:
Steve, not to beat a dead horse on the "what to name the new
parameter" discussion, but I'm wondering what your/others' thoughts
are on using something other than 'by". Maybe even "uby"
Or perhaps we can have a synonym in the function definition:
.. function(........ , by=uby, uby)
The reason I bring this up is that as I begin to use this and I am
reading over my own code, I realize that it takes a lot of visual
parsing to distinguish when the "by" in a complex call belongs to
"[.data.table" and when the "by" belongs to "unique.data.table"
Cheers,
Rick
On Tue, Aug 27, 2013 at 1:23 PM, Steve Lianoglou
<[email protected]
<mailto:[email protected]>> wrote:
Last update here :-)
After more hemming and hawing, I've changed the name of the new
parameter added to duplicated.data.table and unique.data.table from
`by.columnss` to just `by`, as it (more or less) is the same idea as
the `by` in dt[x, i,j,by,...]
Sorry for any inconveniences caused if you've been working off of the
development version.
-steve
On Thu, Aug 15, 2013 at 9:35 PM, Ricardo Saporta
<[email protected]
<mailto:[email protected]>> wrote:
> Steve, great stuff!!
> thanks for making that happen
>
> Rick
>
>
> On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou
> <[email protected]
<mailto:[email protected]>> wrote:
>>
>> Hi all,
>>
>> As I needed this sooner than I had expected, I just committed this
>> change. It's in svn revision 889.
>>
>> I chose 'by.columns' as the parameter names -- seemed to make more
>> sense to me, and using the short hand interactively saves a letter,
>> eg: unique(dt, by=c('some', 'columns')) ;-)
>>
>> Here's the note from the NEWS file:
>>
>> o "Uniqueness" tests can now specify arbirtray combinations of
>> columns to use to test for duplicates. `by.columns` parameter
added to
>> unique.data.table and duplicated.data.table. This allows the
user to
>> test for uniqueness using any combination of columns in the
>> data.table, where previously the user only had the option to
use the
>> keyed columns (if keyed) or all columns (if not). The default
behavior
>> sets `by.columns=key(dt)` to maintain backward compatability. See
>> man/duplicated.Rd and tests 986:991 for more information. Thanks to
>> Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for
useful
>> discussions.
>>
>> Should work as advertised assuming my unit tests weren't too
simplistic.
>>
>> Cheers,
>>
>> -steve
>>
>>
>>
>>
>> On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou
>> <[email protected]
<mailto:[email protected]>> wrote:
>> > Thanks for the suggestions, folks.
>> >
>> > Matthew: do you have a preference?
>> >
>> > -steve
>> >
>> > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta
>> > <[email protected]
<mailto:[email protected]>> wrote:
>> >> Steve,
>> >>
>> >> I like your suggestion a lot. I can see putting column
specification
>> >> to
>> >> good use.
>> >>
>> >> As for the argument name, perhaps
>> >> 'use.columns'
>> >>
>> >> And where a value of NULL or FALSE will yield same results as
>> >> `unique.data.frame`
>> >>
>> >> use.columns=key(x) # default behavior
>> >> use.columns=c("col1name", "col7name") #etc
>> >> use.columns=NULL
>> >>
>> >>
>> >> Thanks as always,
>> >> Rick
>> >>
>> >>
>> >>
>> >> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou
>> >> <[email protected]
<mailto:[email protected]>> wrote:
>> >>>
>> >>> Hi folks,
>> >>>
>> >>> I actually want to revisit the fix I made here.
>> >>>
>> >>> Instead of having `use.key` in the signature to
unique.data.table (and
>> >>> duplicated.data.table) to be:
>> >>>
>> >>> function(x,
>> >>> incomparables=FALSE,
>> >>> tolerance=.Machine$double.eps ^ 0.5,
>> >>> use.key=TRUE, ...)
>> >>>
>> >>> How about we switch out use.key for a parameter that
specifies the
>> >>> column names to use in the uniqueness check, which defaults
to key(x)
>> >>> to keep backwards compatibility.
>> >>>
>> >>> For argument's sake (like that?), lets call this parameter
`columns`
>> >>> (by.columns? with.columns? whatever) so:
>> >>>
>> >>> function(x,
>> >>> incomparables=FALSE,
>> >>> tolerance=.Machine$double.eps ^ 0.5,
>> >>> columns=key(x), ...)
>> >>>
>> >>> Then:
>> >>>
>> >>> (1) leaving it alone is the backward compatibile behavior;
>> >>> (2) Perhaps setting it to NULL will use all columns, and
make it
>> >>> equivalent to unique.data.frame (also the same when x has
no key); and
>> >>> (3) setting it to any other combo of columns uses those
columns as the
>> >>> uniqueness key and filters the rows (only) out of x
accordingly.
>> >>>
>> >>> What do you folks think? Personally I think this is better
on all
>> >>> accounts then just specifying to use the key or not and the
only
>> >>> question in my mind is the name of the argument -- happy to
hear other
>> >>> world views, however, so don't be shy.
>> >>>
>> >>> Thanks,
>> >>> -steve
>> >>>
>> >>> --
>> >>> Steve Lianoglou
>> >>> Computational Biologist
>> >>> Bioinformatics and Computational Biology
>> >>> Genentech
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Steve Lianoglou
>> > Computational Biologist
>> > Bioinformatics and Computational Biology
>> > Genentech
>>
>>
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>
>
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help