I'm a noob to data.table and I've got a couple of questions:
1). Why do I get different answers in the following example:
> DT =
> data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))>
> setkeyv(DT,cols=c("a","x","y","z","zz"))> DT[,if(.N>=4)
> {list(predict(smooth.spline(x,y),*c(4,5,6)*)$y)} ,by=z] z V1
1: 1 2.1000000
2: 1 2.5000000
3: 1 2.9000000
4: 2 0.9998959
5: 2 2.0453352
6: 2 2.9093247
Versus:
> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*a[1:3]*)$y)} ,by=z] z
> V1
1: 1 2.100000
2: 1 2.500000
3: 1 2.900000
4: 2 2.999995
5: 2 2.954664
6: 2 2.909333
Is some sort of recycling going on here?
2). How can I do some sort of nested "by" statement?
Let's say I want to set by=zz, but run the spline statement within
each z subset. Do I use .SD somehow?
This is great package - it's just taking me some time to get the
syntax right. I've found this to be faster than clusterMap on 2
cores...
I hope I've used the correct terminology!
Best,
John
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help