Hi John, (resending because I was bounced from list due to sending from wrong email address)
Please use "reply-all" when replying to emails on this list so that discussion stays "on list" and others can help with and benefit from the discussion. Comments below: On Aug 6, 2013, at 2:40 PM, John Kerpel <[email protected]> wrote: > Steve: > > To follow up on my question from a couple of days ago, assuming the > following: > DT = > data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2)) > setkeyv(DT,cols=c("a","x","y","z","zz")) > #DT[,if(.N>=4) {list(predict(smooth.spline(x,y),a)$y)} ,by=c('z', 'zz')] > > a=c(4:13) > y=c(1,1,2,2,2,3,3,3,4,4) > x=1:10 > predict(smooth.spline(x[1:4],y[1:4]),a[1:5])$y > [1] 2.1 2.5 2.9 3.3 3.7 > predict(smooth.spline(x[5:8],y[5:8]),a[6:10])$y > [1] 2.954664 2.909333 2.864003 2.818672 2.773341 > So in this example the predictor a is indexed by zz and (x,y) is indexed by > z. Is there a way to do this in the "by" statement? I've got a workaround > that uses clusterMap, but I'd like to use data.table instead via some > statement like what is commented out above. > Thanks for your help. This seems like the data is setup in a rather strange way -- you'd like to have objects (smooth splines) predict on elements (the `a`s) that are trained on different sets that you want to predict .. there's no "natural" way to use the same data for training and prediction by iterating over subsets at the same time. Perhaps you provided a toy example which isn't how your real data is set up, but if not, I'd recommend perhaps having two different tables (one with your zz's and your z's split), eg: train <- data.table(x=whatever, y=whatever, z=z-index) predict.on <- data.table(a=a.values, z=z-index) Anyway, I'll just leave the code that uses data.table with your current data below with no further comment -- it'll do what you want. library(data.table) a <- c(4:13) y <- c(1,1,2,2,2,3,3,3,4,4) x <- 1:10 z <- c(1,1,1,1,2,2,2,2,3,3) zz <- c(1,1,1,1,1,2,2,2,2,2) DT <- data.table(a=a, y=y, x=x, z=z, zz=zz) setkeyv(DT, 'z') Zs <- unique(DT)$z splines <- lapply(Zs, function(zval) { dt <- DT[J(zval)] if (nrow(dt) >= 4) { ss <- smooth.spline(dt$x, dt$y) } else { ss <- NULL } data.table(zz=zval, ss=list(ss), is.spline=!is.null(ss)) }) splines <- rbindlist(splines)[is.spline == TRUE] setkeyv(splines, 'zz') setkeyv(DT, 'zz') splines[DT, list(preds=predict(ss[[1]], a)$y)] zz preds 1: 1 2.100000 2: 1 2.500000 3: 1 2.900000 4: 1 3.300000 5: 1 3.700000 6: 2 2.954664 7: 2 2.909333 8: 2 2.864003 9: 2 2.818672 10: 2 2.773341 HTH, -steve _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
