Hi, Comments in line
On Fri, Sep 23, 2011 at 11:01 AM, djmuseR <[email protected]> wrote: > Hi: > > I'm playing around with some baseball data and ran into an error whose cause > I don't quite understand. > A subset of the data is here, consisting of all season batting records of > five players: [cut out data] > # Variables I want to sum over each player: > vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B', > 'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP', > 'SH', 'SF', 'GIDP', 'G_old') > > # library('data.table') > DTtst <- data.table(tst, key = 'playerID') > > The following works as I want: > DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID), > nyears = sum(stint == 1L), nteams = length(unique(teamID))), > by = 'playerID'] > DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars] > DT1[DT2] > > # Combining the two into one call doesn't: > > DTtst[, list( beginYear = min(yearID), > endYear = max(yearID), > nyears = sum(stint == 1L), > nteams = length(unique(teamsID)), > lapply(.SD, sum)), > by = playerID, > .SDcols = vars] > # Error in eval(expr, envir, enclos) : object 'yearID' not found > > What am I missing? Is it the lapply() call within list()? Using .SDcols restricts the columns/vars that are injected in the scope of your j-statement (where your `list(...)` is) which are the same as the columns of .SD. yearID isn' in `vars`, and therefore isn't in .SD. To convince yourself, consider this: R> DTtst[, { xx <- .SD browser() }, by='playerID', .SDcols=vars] Called from: eval(expr, envir, enclos) Browse[1]> xx G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old [1,] 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 [2,] 45 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 45 [3,] 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 [4,] 47 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 5 [5,] 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA [6,] 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA See? No yearID. Just make sure all the vars you reference in your j-expression are in your .SDcols > Second question, more out of curiosity than anything else: is there an > analogue in data.table to within() or plyr::mutate, where one can define new > variables within a call and use them to create other variables? An example > of what I have in mind is > > DT[, list(..., PA = AB + BB + HBP + SH + SF, > OBP = ifelse(PA > 0, > round((H + BB + HBP)/(PA - SH - SF), 3), > NA)), > by = playerID] > > I have a fairly strong prior on the answer to this question, but I'll let > others weigh in first. Matthew is fixing `within` in the development version (SVN from r-forge), but there is the recently introduced `:=` -- but this will add these columns to the data.table you are iterating over, which doesn't sound like what you want. Note that your `j-expression` isn't restricted to being a list. Look at the example I gave above for starters, but also you can do: DTtst[, { PA <- AB + BB + HBP + SH + SF list(PA=PA, OBP=ifelse(PA > 0, round((H + BB + HBP)/(PA - SH - SF), 3), NA)) }, by='playerID'] -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
