Steve, great stuff!! thanks for making that happen <[email protected]>
Rick On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou < [email protected]> wrote: > Hi all, > > As I needed this sooner than I had expected, I just committed this > change. It's in svn revision 889. > > I chose 'by.columns' as the parameter names -- seemed to make more > sense to me, and using the short hand interactively saves a letter, > eg: unique(dt, by=c('some', 'columns')) ;-) > > Here's the note from the NEWS file: > > o "Uniqueness" tests can now specify arbirtray combinations of > columns to use to test for duplicates. `by.columns` parameter added to > unique.data.table and duplicated.data.table. This allows the user to > test for uniqueness using any combination of columns in the > data.table, where previously the user only had the option to use the > keyed columns (if keyed) or all columns (if not). The default behavior > sets `by.columns=key(dt)` to maintain backward compatability. See > man/duplicated.Rd and tests 986:991 for more information. Thanks to > Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful > discussions. > > Should work as advertised assuming my unit tests weren't too simplistic. > > Cheers, > > -steve > > > > > On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou > <[email protected]> wrote: > > Thanks for the suggestions, folks. > > > > Matthew: do you have a preference? > > > > -steve > > > > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta > > <[email protected]> wrote: > >> Steve, > >> > >> I like your suggestion a lot. I can see putting column specification to > >> good use. > >> > >> As for the argument name, perhaps > >> 'use.columns' > >> > >> And where a value of NULL or FALSE will yield same results as > >> `unique.data.frame` > >> > >> use.columns=key(x) # default behavior > >> use.columns=c("col1name", "col7name") #etc > >> use.columns=NULL > >> > >> > >> Thanks as always, > >> Rick > >> > >> > >> > >> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou > >> <[email protected]> wrote: > >>> > >>> Hi folks, > >>> > >>> I actually want to revisit the fix I made here. > >>> > >>> Instead of having `use.key` in the signature to unique.data.table (and > >>> duplicated.data.table) to be: > >>> > >>> function(x, > >>> incomparables=FALSE, > >>> tolerance=.Machine$double.eps ^ 0.5, > >>> use.key=TRUE, ...) > >>> > >>> How about we switch out use.key for a parameter that specifies the > >>> column names to use in the uniqueness check, which defaults to key(x) > >>> to keep backwards compatibility. > >>> > >>> For argument's sake (like that?), lets call this parameter `columns` > >>> (by.columns? with.columns? whatever) so: > >>> > >>> function(x, > >>> incomparables=FALSE, > >>> tolerance=.Machine$double.eps ^ 0.5, > >>> columns=key(x), ...) > >>> > >>> Then: > >>> > >>> (1) leaving it alone is the backward compatibile behavior; > >>> (2) Perhaps setting it to NULL will use all columns, and make it > >>> equivalent to unique.data.frame (also the same when x has no key); and > >>> (3) setting it to any other combo of columns uses those columns as the > >>> uniqueness key and filters the rows (only) out of x accordingly. > >>> > >>> What do you folks think? Personally I think this is better on all > >>> accounts then just specifying to use the key or not and the only > >>> question in my mind is the name of the argument -- happy to hear other > >>> world views, however, so don't be shy. > >>> > >>> Thanks, > >>> -steve > >>> > >>> -- > >>> Steve Lianoglou > >>> Computational Biologist > >>> Bioinformatics and Computational Biology > >>> Genentech > >> > >> > > > > > > > > -- > > Steve Lianoglou > > Computational Biologist > > Bioinformatics and Computational Biology > > Genentech > > > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
