yeah, I disagree with this view. I don't think [] should pursue compatibility with merge.
On Fri, May 3, 2013 at 11:54 AM, Gabor Grothendieck <[email protected] > wrote: > I think that from the viewpoint of compatibility and convenience it > would be best to implement all.x and all.y and not rely on swapping X > and Y. SQLite did something like this (they implemented left join but > not right join based on the idea that all you have to do is swap join > arguments) but the problem with it is that it adds a layer of mental > specification effort if the actual problem is better stated in the > unsupported orientation. > > On Fri, May 3, 2013 at 12:49 PM, Eduard Antonyan > <[email protected]> wrote: > > Arun, it only needs the addition of smth like X[Y, keep.all = TRUE], all > of > > the other merge options already exist as either X[Y] or Y[X] with or > without > > nomatch = 0/NA. > > > > > > On Fri, May 3, 2013 at 11:45 AM, Arunkumar Srinivasan > > <[email protected]> wrote: > >> > >> Gabor, > >> > >> Very true. I suppose your request is that the x[i] where `i` is a > >> data.table should have the same set of options like R's base `merge` > >> function, like, by.y=TRUE, by.x=TRUE or all=TRUE. I like the idea by > itself. > >> However, I am not able to think of a way to do this. I mean, I find the > >> syntax X[Y, by.x=TRUE] weird / not making sense. That is, to me even > though > >> > >> X[Y] is equal to Y[X, by.y=TRUE] (or) X[Y, by.x=TRUE] (ignoring the > >> reordered columns) the latter 2 don't seem to make sense/is redundant > (maybe > >> it's because I am used to this syntax). > >> > >> Arun > >> > >> On Friday, May 3, 2013 at 5:57 PM, Gabor Grothendieck wrote: > >> > >> In my last post it should have read: > >> > >> That X[Y] is not the same as Y[X] is analogous to the fact that > >> merge(X, Y, all.y=TRUE) is not the same as merge(Y, X, all.y=TRUE) > >> > >> On Fri, May 3, 2013 at 11:55 AM, Gabor Grothendieck > >> <[email protected]> wrote: > >> > >> Assuming same-named keys, then these are all the same except possibly > >> for row and column order: > >> > >> X[Y,,nomatch=0] > >> Y[X,,nomatch=0] > >> merge(X, Y) > >> merge(Y, X) > >> > >> That X[Y] is not the same as Y[X] is analogous to the fact that > >> merge(X, Y, all.x=TRUE) is not the same as merge(Y, X, all.x=TRUE) > >> > >> On Fri, May 3, 2013 at 11:46 AM, Arunkumar Srinivasan > >> <[email protected]> wrote: > >> > >> Gabor, > >> > >> X[Y] and Y[X] are not necessarily the same operations (meaning, they > don't > >> *have* to give the same output). However, merge(X,Y) and merge(Y,X) > *have* > >> to provide the same output (except for the column order and names). In > >> that > >> sense, a join is a bit different from a merge, no? > >> > >> Arun > >> > >> On Friday, May 3, 2013 at 5:36 PM, Gabor Grothendieck wrote: > >> > >> Yes, except that is not really what happens since match() only matches > >> one row whereas with mult="all", the default, all rows are matched > >> which is not really matching in the sense of match(). The current > >> naming confuses matching with joining and its really the latter that > >> is being done. > >> > >> Regarding the existence of merge the advantage of [ is that it will > >> automatically only take the columns needed so merge is not really > >> equivalent to [ in all respects. Furthermore having to use different > >> constructs for different types of merge seems awkward. > >> > >> > >> On Fri, May 3, 2013 at 11:27 AM, Eduard Antonyan > >> <[email protected]> wrote: > >> > >> Btw the way I think about the "nomatch" name is as follows - normally > X[Y] > >> tries to match rows of Y with rows of X, and then "nomatch" tells it > what > >> to > >> do when there is *no match*. > >> > >> > >> On Fri, May 3, 2013 at 10:23 AM, Eduard Antonyan > >> <[email protected]> > >> wrote: > >> > >> > >> To clarify - that behavior is already implemented in merge (more > >> specifically merge.data.table). I don't really have a view on having it > in > >> X[Y] as well - I don't like all.x and all.y as the names, since there > are > >> no > >> params named 'x' and 'y' in [.data.table (as opposed to merge), but some > >> param that would do a full outer join could certainly be added. > >> > >> > >> On Fri, May 3, 2013 at 10:09 AM, Gabor Grothendieck > >> <[email protected]> wrote: > >> > >> > >> Yes, sorry. Its nomatch= which presumably derives from the parameter > >> of the same name in the match() function. If the idea of the nomatch= > >> name was to leverage off existing argument names in R then I would > >> prefer all.y= to be consistent with merge() in place of nomatch= since > >> we are really merging/joining rather than just matching. That would > >> also allow extension to all types of join by adding all.an x= argument > >> too. > >> > >> On Fri, May 3, 2013 at 10:59 AM, Eduard Antonyan > >> <[email protected]> wrote: > >> > >> I would prefer nomatch=0 as a default though, simply because that's > >> what I > >> do most of the time :) > >> > >> > >> On Fri, May 3, 2013 at 9:57 AM, Eduard Antonyan > >> <[email protected]> > >> wrote: > >> > >> > >> A correction - the param is called "nomatch", not "match". > >> > >> This use case seems like smth a user shouldn't really do - in an ideal > >> world you should have them both keyed by the same-name column. > >> > >> As is, my view on it is that data.table is correcting the user mistake > >> of > >> naming the column in Y - y, instead of x, and so the output makes > >> sense and > >> I don't see the need of complicating the behavior by adding more cases > >> one > >> has to go through to figure out what the output columns would be. > >> Similar to > >> asking for X[J(c("b", "c", "d"))] - you wouldn't want an anonymous > >> column > >> there, would you? > >> > >> > >> > >> On Fri, May 3, 2013 at 6:18 AM, Gabor Grothendieck > >> <[email protected]> wrote: > >> > >> > >> I am moving this discussion which started with mdowle to the list. > >> > >> Consider this example slightly modified from the data.table FAQ: > >> > >> X = data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") > >> Y = data.table(y=c("b","c","d"), bar=c(4,2,3)) > >> out <- X[Y]; out > >> > >> x foo bar > >> 1: b 3 4 > >> 2: b 4 4 > >> 3: b 5 4 > >> 4: c 6 2 > >> 5: c 7 2 > >> 6: d NA 3 > >> > >> Note that the first column of the output is labelled x even though > >> the > >> data to produce it comes from y, e.g. "d" in out$x is not in X$x but > >> does appear in Y$y so clearly the data is coming from y as opposed to > >> x . In terms of SQL the above would be written: > >> > >> select Y.y as x, ... > >> > >> and the need to renamne the first column of out suggests that there > >> may be a deeper problem here. > >> > >> Here are some ideas to address this (they would require changes to > >> data.table): > >> > >> - the default of X[Y,, match=NA] would be changed to a default of > >> X[Y,,match=0] so that it corresponds to the defaults in R's merge and > >> in SQL joins. > >> > >> - the column name of the first column in the example above would be > >> changed to y if match=0 but be left at x if match=NA. In the case > >> that match=0 (the proposed new default) x and y are equal so the > >> first > >> column can be validly labelled as x but in the case that match=NA > >> they > >> are not so y would be used as the column name. > >> > >> - the name match= does seem a bit misleading since R's match only > >> matches one item in the target whereas in data.table match matches > >> many if mult="all" and that is the default. Perhaps some thought > >> should be given to a name change here? > >> > >> The above would seem to correspond more closely to R's merge and SQL > >> join defaults. Any use cases or other comments? > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > >> _______________________________________________ > >> datatable-help mailing list > >> [email protected] > >> > >> > >> > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >> > >> > >> > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > >> > >> > >> > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > >> _______________________________________________ > >> datatable-help mailing list > >> [email protected] > >> > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >> > >> > >> > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > >> > >> > >> > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > >> > >> > > > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
