[datatable-help] Question about by statements and subsetting

John Kerpel Fri, 02 Aug 2013 10:27:40 -0700

I'm a noob to data.table and I've got a couple of questions:

1).  Why do I get different answers in the following example:


> DT = 
> data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))>
>  setkeyv(DT,cols=c("a","x","y","z","zz"))> DT[,if(.N>=4) 
> {list(predict(smooth.spline(x,y),*c(4,5,6)*)$y)} ,by=z]   z        V1
1: 1 2.1000000
2: 1 2.5000000
3: 1 2.9000000
4: 2 0.9998959
5: 2 2.0453352
6: 2 2.9093247

Versus:

> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*a[1:3]*)$y)} ,by=z]   z       
> V1
1: 1 2.100000
2: 1 2.500000
3: 1 2.900000
4: 2 2.999995
5: 2 2.954664
6: 2 2.909333

Is some sort of recycling going on here?


2).  How can I do some sort of nested "by" statement?
Let's say I want to set by=zz, but run the spline statement within
each z subset.  Do I use .SD somehow?

This is great package - it's just taking me some time to get the
syntax right.  I've found this to be faster than clusterMap on 2
cores...
I hope I've used the correct terminology!

Best,

John

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

[datatable-help] Question about by statements and subsetting

Reply via email to