Thanks, Steve. The clarification about .SDcols was sufficient. In that context, everything else was clear.
Best regards, Dennis On Fri, Sep 23, 2011 at 8:59 AM, Steve Lianoglou <[email protected]> wrote: > Hi, > > Comments in line > > On Fri, Sep 23, 2011 at 11:01 AM, djmuseR <[email protected]> wrote: >> Hi: >> >> I'm playing around with some baseball data and ran into an error whose cause >> I don't quite understand. >> A subset of the data is here, consisting of all season batting records of >> five players: > > [cut out data] > >> # Variables I want to sum over each player: >> vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B', >> 'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP', >> 'SH', 'SF', 'GIDP', 'G_old') >> >> # library('data.table') >> DTtst <- data.table(tst, key = 'playerID') >> >> The following works as I want: >> DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID), >> nyears = sum(stint == 1L), nteams = length(unique(teamID))), >> by = 'playerID'] >> DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars] >> DT1[DT2] >> >> # Combining the two into one call doesn't: >> >> DTtst[, list( beginYear = min(yearID), >> endYear = max(yearID), >> nyears = sum(stint == 1L), >> nteams = length(unique(teamsID)), >> lapply(.SD, sum)), >> by = playerID, >> .SDcols = vars] >> # Error in eval(expr, envir, enclos) : object 'yearID' not found >> >> What am I missing? Is it the lapply() call within list()? > > Using .SDcols restricts the columns/vars that are injected in the > scope of your j-statement (where your `list(...)` is) which are the > same as the columns of .SD. > > yearID isn' in `vars`, and therefore isn't in .SD. To convince > yourself, consider this: > > R> DTtst[, { > xx <- .SD > browser() > }, by='playerID', .SDcols=vars] > > Called from: eval(expr, envir, enclos) > Browse[1]> xx > G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old > [1,] 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 > [2,] 45 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 45 > [3,] 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 > [4,] 47 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 5 > [5,] 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA > [6,] 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA > > See? No yearID. > > Just make sure all the vars you reference in your j-expression are in > your .SDcols > > >> Second question, more out of curiosity than anything else: is there an >> analogue in data.table to within() or plyr::mutate, where one can define new >> variables within a call and use them to create other variables? An example >> of what I have in mind is >> >> DT[, list(..., PA = AB + BB + HBP + SH + SF, >> OBP = ifelse(PA > 0, >> round((H + BB + HBP)/(PA - SH - SF), 3), >> NA)), >> by = playerID] >> >> I have a fairly strong prior on the answer to this question, but I'll let >> others weigh in first. > > Matthew is fixing `within` in the development version (SVN from > r-forge), but there is the recently introduced `:=` -- but this will > add these columns to the data.table you are iterating over, which > doesn't sound like what you want. > > Note that your `j-expression` isn't restricted to being a list. Look > at the example I gave above for starters, but also you can do: > > DTtst[, { > PA <- AB + BB + HBP + SH + SF > list(PA=PA, OBP=ifelse(PA > 0, round((H + BB + HBP)/(PA - SH - SF), 3), NA)) > }, by='playerID'] > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
