I have encountered a bug in the Cartesian join of two data.tables, where the 
resulting data.table is not sorted by its full key. This is in data.table 
v1.8.8. Please let me know if this issue has been brought up or if there is any 
insight regarding it.

Thank you,
Shir Levkowitz



-------------------------------------------------

library(data.table)

###### set up our example data tables
test1 <- data.table(a=sample(1:3, 100, replace=TRUE),
                    b=sample(1:3, 100, replace=TRUE),
                    c=sample(1:10, 100,replace=TRUE))
setkey(test1, a,b,c)

test2 <- data.table(p=sample(1:3, 100, replace=TRUE),
                    q=sample(1:3, 100, replace=TRUE),
                    r=sample(1:100),
                    w=sample(1:100))
setkey(test2, p,q)


###### a cartesian join - this is where the issue arises
test.join <- test1[test2,nomatch=0, allow.cartesian=TRUE]

### have a look at the key
k <- key(test.join)
k

### if we do a group by, we don't get the right aggregation
test.gb <- test.join[,.N,by='a,b,c']
test.gb[a == 1 & b == 1 & c == 1,]
### when really what we want is:
test.agg <- aggregate(r ~a+b+c, test.join, length)
subset(test.agg, a == 1 & b == 1 & c == 1)

### if we set the same key, we get a warning
setkeyv(test.join, k)

>> Warning message: 
In setkeyv(test.join, k) : Already keyed by this key but had invalid row order, 
key rebuilt. If you didn't go under the hood please let datatable-help know so 
the root cause can be fixed.


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to