To clarify - that behavior is already implemented in merge (more specifically merge.data.table). I don't really have a view on having it in X[Y] as well - I don't like all.x and all.y as the names, since there are no params named 'x' and 'y' in [.data.table (as opposed to merge), but some param that would do a full outer join could certainly be added.
On Fri, May 3, 2013 at 10:09 AM, Gabor Grothendieck <[email protected] > wrote: > Yes, sorry. Its nomatch= which presumably derives from the parameter > of the same name in the match() function. If the idea of the nomatch= > name was to leverage off existing argument names in R then I would > prefer all.y= to be consistent with merge() in place of nomatch= since > we are really merging/joining rather than just matching. That would > also allow extension to all types of join by adding all.an x= argument > too. > > On Fri, May 3, 2013 at 10:59 AM, Eduard Antonyan > <[email protected]> wrote: > > I would prefer nomatch=0 as a default though, simply because that's what > I > > do most of the time :) > > > > > > On Fri, May 3, 2013 at 9:57 AM, Eduard Antonyan < > [email protected]> > > wrote: > >> > >> A correction - the param is called "nomatch", not "match". > >> > >> This use case seems like smth a user shouldn't really do - in an ideal > >> world you should have them both keyed by the same-name column. > >> > >> As is, my view on it is that data.table is correcting the user mistake > of > >> naming the column in Y - y, instead of x, and so the output makes sense > and > >> I don't see the need of complicating the behavior by adding more cases > one > >> has to go through to figure out what the output columns would be. > Similar to > >> asking for X[J(c("b", "c", "d"))] - you wouldn't want an anonymous > column > >> there, would you? > >> > >> > >> > >> On Fri, May 3, 2013 at 6:18 AM, Gabor Grothendieck > >> <[email protected]> wrote: > >>> > >>> I am moving this discussion which started with mdowle to the list. > >>> > >>> Consider this example slightly modified from the data.table FAQ: > >>> > >>> > X = data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") > >>> > Y = data.table(y=c("b","c","d"), bar=c(4,2,3)) > >>> > out <- X[Y]; out > >>> x foo bar > >>> 1: b 3 4 > >>> 2: b 4 4 > >>> 3: b 5 4 > >>> 4: c 6 2 > >>> 5: c 7 2 > >>> 6: d NA 3 > >>> > >>> Note that the first column of the output is labelled x even though the > >>> data to produce it comes from y, e.g. "d" in out$x is not in X$x but > >>> does appear in Y$y so clearly the data is coming from y as opposed to > >>> x . In terms of SQL the above would be written: > >>> > >>> select Y.y as x, ... > >>> > >>> and the need to renamne the first column of out suggests that there > >>> may be a deeper problem here. > >>> > >>> Here are some ideas to address this (they would require changes to > >>> data.table): > >>> > >>> - the default of X[Y,, match=NA] would be changed to a default of > >>> X[Y,,match=0] so that it corresponds to the defaults in R's merge and > >>> in SQL joins. > >>> > >>> - the column name of the first column in the example above would be > >>> changed to y if match=0 but be left at x if match=NA. In the case > >>> that match=0 (the proposed new default) x and y are equal so the first > >>> column can be validly labelled as x but in the case that match=NA they > >>> are not so y would be used as the column name. > >>> > >>> - the name match= does seem a bit misleading since R's match only > >>> matches one item in the target whereas in data.table match matches > >>> many if mult="all" and that is the default. Perhaps some thought > >>> should be given to a name change here? > >>> > >>> The above would seem to correspond more closely to R's merge and SQL > >>> join defaults. Any use cases or other comments? > >>> > >>> -- > >>> Statistics & Software Consulting > >>> GKX Group, GKX Associates Inc. > >>> tel: 1-877-GKX-GROUP > >>> email: ggrothendieck at gmail.com > >>> _______________________________________________ > >>> datatable-help mailing list > >>> [email protected] > >>> > >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >> > >> > > > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
