Steve, not to beat a dead horse on the "what to name the new parameter" discussion, but I'm wondering what your/others' thoughts are on using something other than 'by". Maybe even "uby"
Or perhaps we can have a synonym in the function definition: .. function(........ , by=uby, uby) The reason I bring this up is that as I begin to use this and I am reading over my own code, I realize that it takes a lot of visual parsing to distinguish when the "by" in a complex call belongs to "[.data.table" and when the "by" belongs to "unique.data.table" Cheers, Rick On Tue, Aug 27, 2013 at 1:23 PM, Steve Lianoglou < [email protected]> wrote: > Last update here :-) > > After more hemming and hawing, I've changed the name of the new > parameter added to duplicated.data.table and unique.data.table from > `by.columnss` to just `by`, as it (more or less) is the same idea as > the `by` in dt[x, i,j,by,...] > > Sorry for any inconveniences caused if you've been working off of the > development version. > > -steve > > > On Thu, Aug 15, 2013 at 9:35 PM, Ricardo Saporta > <[email protected]> wrote: > > Steve, great stuff!! > > thanks for making that happen > > > > Rick > > > > > > On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou > > <[email protected]> wrote: > >> > >> Hi all, > >> > >> As I needed this sooner than I had expected, I just committed this > >> change. It's in svn revision 889. > >> > >> I chose 'by.columns' as the parameter names -- seemed to make more > >> sense to me, and using the short hand interactively saves a letter, > >> eg: unique(dt, by=c('some', 'columns')) ;-) > >> > >> Here's the note from the NEWS file: > >> > >> o "Uniqueness" tests can now specify arbirtray combinations of > >> columns to use to test for duplicates. `by.columns` parameter added to > >> unique.data.table and duplicated.data.table. This allows the user to > >> test for uniqueness using any combination of columns in the > >> data.table, where previously the user only had the option to use the > >> keyed columns (if keyed) or all columns (if not). The default behavior > >> sets `by.columns=key(dt)` to maintain backward compatability. See > >> man/duplicated.Rd and tests 986:991 for more information. Thanks to > >> Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful > >> discussions. > >> > >> Should work as advertised assuming my unit tests weren't too simplistic. > >> > >> Cheers, > >> > >> -steve > >> > >> > >> > >> > >> On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou > >> <[email protected]> wrote: > >> > Thanks for the suggestions, folks. > >> > > >> > Matthew: do you have a preference? > >> > > >> > -steve > >> > > >> > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta > >> > <[email protected]> wrote: > >> >> Steve, > >> >> > >> >> I like your suggestion a lot. I can see putting column specification > >> >> to > >> >> good use. > >> >> > >> >> As for the argument name, perhaps > >> >> 'use.columns' > >> >> > >> >> And where a value of NULL or FALSE will yield same results as > >> >> `unique.data.frame` > >> >> > >> >> use.columns=key(x) # default behavior > >> >> use.columns=c("col1name", "col7name") #etc > >> >> use.columns=NULL > >> >> > >> >> > >> >> Thanks as always, > >> >> Rick > >> >> > >> >> > >> >> > >> >> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou > >> >> <[email protected]> wrote: > >> >>> > >> >>> Hi folks, > >> >>> > >> >>> I actually want to revisit the fix I made here. > >> >>> > >> >>> Instead of having `use.key` in the signature to unique.data.table > (and > >> >>> duplicated.data.table) to be: > >> >>> > >> >>> function(x, > >> >>> incomparables=FALSE, > >> >>> tolerance=.Machine$double.eps ^ 0.5, > >> >>> use.key=TRUE, ...) > >> >>> > >> >>> How about we switch out use.key for a parameter that specifies the > >> >>> column names to use in the uniqueness check, which defaults to > key(x) > >> >>> to keep backwards compatibility. > >> >>> > >> >>> For argument's sake (like that?), lets call this parameter `columns` > >> >>> (by.columns? with.columns? whatever) so: > >> >>> > >> >>> function(x, > >> >>> incomparables=FALSE, > >> >>> tolerance=.Machine$double.eps ^ 0.5, > >> >>> columns=key(x), ...) > >> >>> > >> >>> Then: > >> >>> > >> >>> (1) leaving it alone is the backward compatibile behavior; > >> >>> (2) Perhaps setting it to NULL will use all columns, and make it > >> >>> equivalent to unique.data.frame (also the same when x has no key); > and > >> >>> (3) setting it to any other combo of columns uses those columns as > the > >> >>> uniqueness key and filters the rows (only) out of x accordingly. > >> >>> > >> >>> What do you folks think? Personally I think this is better on all > >> >>> accounts then just specifying to use the key or not and the only > >> >>> question in my mind is the name of the argument -- happy to hear > other > >> >>> world views, however, so don't be shy. > >> >>> > >> >>> Thanks, > >> >>> -steve > >> >>> > >> >>> -- > >> >>> Steve Lianoglou > >> >>> Computational Biologist > >> >>> Bioinformatics and Computational Biology > >> >>> Genentech > >> >> > >> >> > >> > > >> > > >> > > >> > -- > >> > Steve Lianoglou > >> > Computational Biologist > >> > Bioinformatics and Computational Biology > >> > Genentech > >> > >> > >> > >> -- > >> Steve Lianoglou > >> Computational Biologist > >> Bioinformatics and Computational Biology > >> Genentech > > > > > > > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
