I would prefer nomatch=0 as a default though, simply because that's what I do most of the time :)
On Fri, May 3, 2013 at 9:57 AM, Eduard Antonyan <[email protected]>wrote: > A correction - the param is called "nomatch", not "match". > > This use case seems like smth a user shouldn't really do - in an ideal > world you should have them both keyed by the same-name column. > > As is, my view on it is that data.table is correcting the user mistake of > naming the column in Y - y, instead of x, and so the output makes sense and > I don't see the need of complicating the behavior by adding more cases one > has to go through to figure out what the output columns would be. Similar > to asking for X[J(c("b", "c", "d"))] - you wouldn't want an anonymous > column there, would you? > > > > On Fri, May 3, 2013 at 6:18 AM, Gabor Grothendieck < > [email protected]> wrote: > >> I am moving this discussion which started with mdowle to the list. >> >> Consider this example slightly modified from the data.table FAQ: >> >> > X = data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") >> > Y = data.table(y=c("b","c","d"), bar=c(4,2,3)) >> > out <- X[Y]; out >> x foo bar >> 1: b 3 4 >> 2: b 4 4 >> 3: b 5 4 >> 4: c 6 2 >> 5: c 7 2 >> 6: d NA 3 >> >> Note that the first column of the output is labelled x even though the >> data to produce it comes from y, e.g. "d" in out$x is not in X$x but >> does appear in Y$y so clearly the data is coming from y as opposed to >> x . In terms of SQL the above would be written: >> >> select Y.y as x, ... >> >> and the need to renamne the first column of out suggests that there >> may be a deeper problem here. >> >> Here are some ideas to address this (they would require changes to >> data.table): >> >> - the default of X[Y,, match=NA] would be changed to a default of >> X[Y,,match=0] so that it corresponds to the defaults in R's merge and >> in SQL joins. >> >> - the column name of the first column in the example above would be >> changed to y if match=0 but be left at x if match=NA. In the case >> that match=0 (the proposed new default) x and y are equal so the first >> column can be validly labelled as x but in the case that match=NA they >> are not so y would be used as the column name. >> >> - the name match= does seem a bit misleading since R's match only >> matches one item in the target whereas in data.table match matches >> many if mult="all" and that is the default. Perhaps some thought >> should be given to a name change here? >> >> The above would seem to correspond more closely to R's merge and SQL >> join defaults. Any use cases or other comments? >> >> -- >> Statistics & Software Consulting >> GKX Group, GKX Associates Inc. >> tel: 1-877-GKX-GROUP >> email: ggrothendieck at gmail.com >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
