(The third time, I'm growing tired of this 40KB message taking over half-hour
to reach me! :) )
Gabor,
About the behaviour of X[Y]:
The current definition of X[Y] is "it's a join looking up X's rows using Y as
an index". By this definition, the output of X[Y] is very much justified, I
think. Y is just used as an index. To me it feels similar to, say, X[8] (which
gives NA, NA with the same column names as X).
Another thought that occurs to me is, say, in this example:
X <- data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") Y <-
data.table(y=c("b"), bar=c(4))
X[Y]
Here again, you query for Y's y values in X's key column and join X and Y's
columns. There's no such Y-value where X gives NA. The data then is coming from
"X" and "Y" (as opposed to the case "d" you showed where the data comes just
from "Y"). In this case should it be named "x" or "y"?? Always "x" makes sense
to me. And Y[X] would give a "y" instead. However, I am not that good with sql
joins. So I may very well have missed your point here.
Regarding `merge`:
x <- as.data.frame(X)
y <- as.data.frame(Y)
merge(x, y, by.x="x", by.y="y", all=TRUE) # --- (1)
merge(y, x, by.x="y", by.y="x", all=TRUE) # --- (2)
The (1) always gives the column name "x" and (2) always "y". And so does X[Y]
as opposed to Y[X], except for the fact that the operations X[Y] and Y[X] are
not identical (as opposed to merge). So, I don't see a dissimilarity here.
Again, I may have gotten through your point wrongly and would love to be
corrected if so.
About the case `"nomatch"`, I agree with you that the name could be changed to
avoid confusion with R's `match`. Maybe "missing = NA" and "missing = 0" makes
more sense?
Best regards,
Arun
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help