Hi everybody,  
Regarding FR #5072 here: 
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975

Let's take two data.tables X and Y with key set to one column, "V1". data.table 
currently deals with Y[X] differently when Y is a factor and 1) X is a factor 
and 2) X is not a factor. Let me illustrate this:

case 1:
# X and Y are factors
require(data.table)
X <- data.table(V1=factor(c("A", "B", "C")))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")

> Y[X] # X is a factor
  V1
1:  A
2:  B
3:  C

> Y[X]$V1
[1] A B C
Levels: A B C


** Note that when both X and Y are factors, only the levels of X are in the 
join'd result (no D/E).

case 2:
# X is **not** a factor
require(data.table)
X <- data.table(V1=c("A", "B", "C"))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")

> Y[X] # x is not a factor
   V1
1: NA
2:  B
3: NA


> Y[X]$V1
[1] <NA> B    <NA>
Levels: B D E


** Note that the results have "NA" in them as the join is concerned with 
retaining levels from "Y".

The first question is: Why this difference? Should there be a difference 
between when X is or is not a factor? What do you guys think should be the 
intended result?

The side-effect comes during "merge" as it internally uses this principle (and 
hence FR #5072). For example:

merge(X, Y, by="V1", all=TRUE)
   V1
1: NA
2: NA
3:  B
4:  D
5:  E


> merge(X, Y, by="V1", all=TRUE)$V1
[1] <NA> <NA> B    D    E
Levels: B D E


The second question is: Is this intended result?

Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to