match

Arunkumar Srinivasan Fri, 03 May 2013 08:45:35 -0700

(The third time, I'm growing tired of this 40KB message taking over half-hour 
to reach me! :) )


Gabor,

About the behaviour of X[Y]:

The current definition of X[Y] is "it's a join looking up X's rows using Y as 
an index". By this definition, the output of X[Y] is very much justified, I 
think. Y is just used as an index. To me it feels similar to, say, X[8] (which 
gives NA, NA with the same column names as X). 

Another thought that occurs to me is, say, in this example:
X <- data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") Y <- 
data.table(y=c("b"), bar=c(4))
X[Y]
Here again, you query for Y's y values in X's key column and join X and Y's 
columns. There's no such Y-value where X gives NA. The data then is coming from 
"X" and "Y" (as opposed to the case "d" you showed where the data comes just 
from "Y"). In this case should it be named "x" or "y"?? Always "x" makes sense 
to me. And Y[X] would give a "y" instead. However, I am not that good with sql 
joins. So I may very well have missed your point here. 


Regarding `merge`:

    x <- as.data.frame(X)
    y <- as.data.frame(Y)

    merge(x, y, by.x="x", by.y="y", all=TRUE) # --- (1)
    merge(y, x, by.x="y", by.y="x", all=TRUE) # --- (2)

The (1) always gives the column name "x" and (2) always "y". And so does X[Y] 
as opposed to Y[X], except for the fact that the operations X[Y] and Y[X] are 
not identical (as opposed to merge). So, I don't see a dissimilarity here. 
Again, I may have gotten through your point wrongly and would love to be 
corrected if so.

About the case `"nomatch"`, I agree with you that the name could be changed to 
avoid confusion with R's `match`. Maybe "missing = NA" and "missing = 0" makes 
more sense? 

Best regards, 
Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] merge/join/match

Reply via email to