Wow, thx! I didn't think it would be straightforward - but to your point I will try to set up my data differently to see if I can simplify the process.
On Tue, Aug 6, 2013 at 5:45 PM, Steve Lianoglou < [email protected]> wrote: > Hi John, > > (resending because I was bounced from list due to sending from wrong > email address) > > Please use "reply-all" when replying to emails on this list so that > discussion stays "on list" and others can help with and benefit from > the discussion. > > Comments below: > > On Aug 6, 2013, at 2:40 PM, John Kerpel <[email protected]> wrote: > > > Steve: > > > > To follow up on my question from a couple of days ago, assuming the > > following: > > > DT = > data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2)) > > setkeyv(DT,cols=c("a","x","y","z","zz")) > > #DT[,if(.N>=4) {list(predict(smooth.spline(x,y),a)$y)} ,by=c('z', 'zz')] > > > > a=c(4:13) > > y=c(1,1,2,2,2,3,3,3,4,4) > > x=1:10 > > predict(smooth.spline(x[1:4],y[1:4]),a[1:5])$y > > [1] 2.1 2.5 2.9 3.3 3.7 > > predict(smooth.spline(x[5:8],y[5:8]),a[6:10])$y > > [1] 2.954664 2.909333 2.864003 2.818672 2.773341 > > > So in this example the predictor a is indexed by zz and (x,y) is indexed > by > > z. Is there a way to do this in the "by" statement? I've got a > workaround > > that uses clusterMap, but I'd like to use data.table instead via some > > statement like what is commented out above. > > > Thanks for your help. > > This seems like the data is setup in a rather strange way -- you'd > like to have objects (smooth splines) predict on elements (the `a`s) > that are trained on different sets that you want to predict .. there's > no "natural" way to use the same data for training and prediction by > iterating over subsets at the same time. > > Perhaps you provided a toy example which isn't how your real data is > set up, but if not, I'd recommend perhaps having two different tables > (one with your zz's and your z's split), eg: > > train <- data.table(x=whatever, y=whatever, z=z-index) > predict.on <- data.table(a=a.values, z=z-index) > > Anyway, I'll just leave the code that uses data.table with your > current data below with no further comment -- it'll do what you want. > > library(data.table) > > a <- c(4:13) > y <- c(1,1,2,2,2,3,3,3,4,4) > x <- 1:10 > z <- c(1,1,1,1,2,2,2,2,3,3) > zz <- c(1,1,1,1,1,2,2,2,2,2) > DT <- data.table(a=a, y=y, x=x, z=z, zz=zz) > setkeyv(DT, 'z') > Zs <- unique(DT)$z > > splines <- lapply(Zs, function(zval) { > dt <- DT[J(zval)] > if (nrow(dt) >= 4) { > ss <- smooth.spline(dt$x, dt$y) > } else { > ss <- NULL > } > data.table(zz=zval, ss=list(ss), is.spline=!is.null(ss)) > }) > splines <- rbindlist(splines)[is.spline == TRUE] > setkeyv(splines, 'zz') > setkeyv(DT, 'zz') > > splines[DT, list(preds=predict(ss[[1]], a)$y)] > zz preds > 1: 1 2.100000 > 2: 1 2.500000 > 3: 1 2.900000 > 4: 1 3.300000 > 5: 1 3.700000 > 6: 2 2.954664 > 7: 2 2.909333 > 8: 2 2.864003 > 9: 2 2.818672 > 10: 2 2.773341 > > HTH, > -steve >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
