hmm, I see what you mean. The `i` in `all.i = TRUE/FALSE` (in addition to having T/F instead of 0/NA) kind of delineates the behaviour of X[Y] against "merge" sufficiently that users don't fall into the "unexpected output" scenario.
I vote for this change, if there's one :). Arun On Saturday, May 4, 2013 at 1:40 PM, Gabor Grothendieck wrote: > I am not sure but I think that could be handled as a separate issue if > it becomes important. By using all.i= it makes it sufficiently > different from all.y= that users won't expect the same default and > further they will not necessarily expect that there be an all argument > for the left participant in the merge. > > On Sat, May 4, 2013 at 7:35 AM, Arunkumar Srinivasan > <[email protected] (mailto:[email protected])> wrote: > > Gabor, > > Both points I agree with. It brings enough clarity and consistency to the > > syntax. > > Does this mean that you don't mind X[Y] not having all functionalities of > > `merge`? Because this takes care of the confusion of `nomatch` but still > > does not do all merges, iiuc. > > > > Arun > > > > On Saturday, May 4, 2013 at 1:26 PM, Gabor Grothendieck wrote: > > > > The proposal at this point would be: > > > > 1. nomatch= would be replaced by all.i= such that > > X[Y,,nomatch=NA] is the same as X[Y,,all.i=TRUE] > > X[Y,,nomatch=0] is the same as X[Y,,all.i=FALSE] > > nomatch= would be deprecated and ultimately removed. > > > > Note that #1 is simple to implement as it only involves changing names > > and values of arguments and does not really change any behavior; > > however, its easier to think about because X[Y,,all.i=Z] now has the > > same behavior as merge(X, Y, all.y=Z) and so can be quickly understood > > by anyone who knows merge in R. In contrast nomatch= did not even > > have the same meaning as in match() since match matches the first > > occurrence whereas with mult="all", the default, matching in > > data.table matches all occurrences. Note that the default of merge's > > all.y= is all.y=FALSE but the default of all.i= is all.i=TRUE in order > > that the default behave as indices do. Also note that this solves the > > problem that nomatch= can only be 0 or NA since a logical can only > > have two non-NA values anyways. > > > > 2. If Y were a numeric index vector then all.i= will have the same > > effect as if Y were a data.table with Y as its column and is merged > > with the row numbers of X. e.g. X[1:4,,all.i=FALSE] would be the > > same as X[1:3] if X only had 3 rows since 4 does not match a row > > number of X and is dropped because all.i=FALSE. If Y were a numeric > > vector with negative values it would be converted to one with positive > > values in such a way as to have the established meaning and then the > > same strategy is applied. If Y were logical then its recycled giving > > YY and the same strategy is applied to which(YY). This description is > > intended to be conceptual and the actual internal mechanism could be > > different. > > > > Thus #2 allows one to think of **all** i indexing as merging rather > > than as multiple separate concepts (which I believe is consistent with > > the original intention of data.table). > > > > > > > > > > > > > > On Fri, May 3, 2013 at 8:02 PM, Eduard Antonyan > > <[email protected] (mailto:[email protected])> wrote: > > > > I think I like this proposal - maybe you should write up a few examples of > > what current behavior is, vs the proposed behavior. > > > > > > On Fri, May 3, 2013 at 6:54 PM, Gabor Grothendieck <[email protected] > > (mailto:[email protected])> > > wrote: > > > > > > data.table is supposed to generalize indexing and although not > > explicitly stated the generalization seems to be that indexing is > > merging with the row numbers so there is indeed merging going on and > > that merging should respect nomatch= for consistency. > > > > On Fri, May 3, 2013 at 6:54 PM, Eduard Antonyan > > <[email protected] (mailto:[email protected])> wrote: > > > > There is no join'ing happening here, thus nomatch=0 has no effect. > > > > > > On Fri, May 3, 2013 at 5:52 PM, Gabor Grothendieck > > <[email protected] (mailto:[email protected])> > > wrote: > > > > > > The definition of DT was left out by mistake. It should be: > > > > DT <- data.table(a=letters[1:3]) > > > > > > On Fri, May 3, 2013 at 6:50 PM, Gabor Grothendieck > > <[email protected] (mailto:[email protected])> wrote: > > > > Consider this example: > > > > DT[1:4,,nomatch=0] > > > > a > > 1: a > > 2: b > > 3: c > > 4: NA > > > > Should it not return only the first 3 rows? It seems to be ignoring > > the nomatch=0. > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com (http://gmail.com) > > > > > > > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com (http://gmail.com) > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > (mailto:[email protected]) > > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com (http://gmail.com) > > > > > > > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com (http://gmail.com) > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > (mailto:[email protected]) > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com (http://gmail.com) > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
