> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On
> Behalf Of Karl Ove Hufthammer
>
> I have found one way of achieving this, creating two
> identical data.tables with different keys:
>
> options(stringsAsFactors=FALSE)
> dat=data.frame(x=c("1","1","2","3"), y=c("a","b","a","c"))
> dat
> A <- B <- data.table(dat)
> key(A)="x"
> key(B)="y"
>
> A[B["a"][,x]][,y]
>
> The problem is performance (my real-life data.table is *much*
> larger), since B["a"][,x] outputs a character vector. When
> this is used in A[...], the character is converted to a factor
> with appropriate levels, and it turns out (shown using
> 'Rprof') that the majority of the time running the function
> is taken up by 'levels<-', i.e., creating this factor /
> attaching the levels.
>
> I believe one potential solution would be to have both 'x'
> and 'y' being factors, so that there is no conversion to/from
> characters. This would eliminate both the conversion '"a" to
> factor' and 'B["a"][,x] to factor'.
> However, 'data.table' doesn't accept 'i' being a factor (and
> if I convert it to the internal numeric codes, it thinks I
> mean row numbers).
>
> Any suggestions on how to solve this?
To answer part of your inquiry, you can use factors by enclosing i with
J() as follows:
options(stringsAsFactors=TRUE)
dat=data.frame(x=c("1","1","2","3"), y=c("a","b","a","c"))
A <- B <- data.table(dat)
key(A)="x"
key(B)="y"
A[J(B["a"][,x])][,y]
- Tom
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help