Last update here :-) After more hemming and hawing, I've changed the name of the new parameter added to duplicated.data.table and unique.data.table from `by.columnss` to just `by`, as it (more or less) is the same idea as the `by` in dt[x, i,j,by,...]
Sorry for any inconveniences caused if you've been working off of the development version. -steve On Thu, Aug 15, 2013 at 9:35 PM, Ricardo Saporta <[email protected]> wrote: > Steve, great stuff!! > thanks for making that happen > > Rick > > > On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou > <[email protected]> wrote: >> >> Hi all, >> >> As I needed this sooner than I had expected, I just committed this >> change. It's in svn revision 889. >> >> I chose 'by.columns' as the parameter names -- seemed to make more >> sense to me, and using the short hand interactively saves a letter, >> eg: unique(dt, by=c('some', 'columns')) ;-) >> >> Here's the note from the NEWS file: >> >> o "Uniqueness" tests can now specify arbirtray combinations of >> columns to use to test for duplicates. `by.columns` parameter added to >> unique.data.table and duplicated.data.table. This allows the user to >> test for uniqueness using any combination of columns in the >> data.table, where previously the user only had the option to use the >> keyed columns (if keyed) or all columns (if not). The default behavior >> sets `by.columns=key(dt)` to maintain backward compatability. See >> man/duplicated.Rd and tests 986:991 for more information. Thanks to >> Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful >> discussions. >> >> Should work as advertised assuming my unit tests weren't too simplistic. >> >> Cheers, >> >> -steve >> >> >> >> >> On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou >> <[email protected]> wrote: >> > Thanks for the suggestions, folks. >> > >> > Matthew: do you have a preference? >> > >> > -steve >> > >> > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta >> > <[email protected]> wrote: >> >> Steve, >> >> >> >> I like your suggestion a lot. I can see putting column specification >> >> to >> >> good use. >> >> >> >> As for the argument name, perhaps >> >> 'use.columns' >> >> >> >> And where a value of NULL or FALSE will yield same results as >> >> `unique.data.frame` >> >> >> >> use.columns=key(x) # default behavior >> >> use.columns=c("col1name", "col7name") #etc >> >> use.columns=NULL >> >> >> >> >> >> Thanks as always, >> >> Rick >> >> >> >> >> >> >> >> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou >> >> <[email protected]> wrote: >> >>> >> >>> Hi folks, >> >>> >> >>> I actually want to revisit the fix I made here. >> >>> >> >>> Instead of having `use.key` in the signature to unique.data.table (and >> >>> duplicated.data.table) to be: >> >>> >> >>> function(x, >> >>> incomparables=FALSE, >> >>> tolerance=.Machine$double.eps ^ 0.5, >> >>> use.key=TRUE, ...) >> >>> >> >>> How about we switch out use.key for a parameter that specifies the >> >>> column names to use in the uniqueness check, which defaults to key(x) >> >>> to keep backwards compatibility. >> >>> >> >>> For argument's sake (like that?), lets call this parameter `columns` >> >>> (by.columns? with.columns? whatever) so: >> >>> >> >>> function(x, >> >>> incomparables=FALSE, >> >>> tolerance=.Machine$double.eps ^ 0.5, >> >>> columns=key(x), ...) >> >>> >> >>> Then: >> >>> >> >>> (1) leaving it alone is the backward compatibile behavior; >> >>> (2) Perhaps setting it to NULL will use all columns, and make it >> >>> equivalent to unique.data.frame (also the same when x has no key); and >> >>> (3) setting it to any other combo of columns uses those columns as the >> >>> uniqueness key and filters the rows (only) out of x accordingly. >> >>> >> >>> What do you folks think? Personally I think this is better on all >> >>> accounts then just specifying to use the key or not and the only >> >>> question in my mind is the name of the argument -- happy to hear other >> >>> world views, however, so don't be shy. >> >>> >> >>> Thanks, >> >>> -steve >> >>> >> >>> -- >> >>> Steve Lianoglou >> >>> Computational Biologist >> >>> Bioinformatics and Computational Biology >> >>> Genentech >> >> >> >> >> > >> > >> > >> > -- >> > Steve Lianoglou >> > Computational Biologist >> > Bioinformatics and Computational Biology >> > Genentech >> >> >> >> -- >> Steve Lianoglou >> Computational Biologist >> Bioinformatics and Computational Biology >> Genentech > > -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
