On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:


On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:

Thanks for the invaluable help on my previous questions. The speed up in create summary tables has been immense and I’m enthused about all the possibilities going forward.

I’m currently stuck in trying to put together syntax for a “long” for table. In the example below, each case is a unique Student by Year combination. What I’m trying to do is take such a table, aggregate on the student’s current year (i.e., 2009 in this data) SCHOOL_NUMBER, and calculate their mean score in the previous year (i.e., 2008 in this data).

If the file were “wide”, with each case representing a unique student with separate variables for the year, then it would be easy to break on the 2009 SCHOOL_NUMBER and take the
mean of the 2008 SCORE.

But there is only one 2008 SCORE for each student???


Is conversion of long to wide necessary to do this?

Probably not. Are you familiar with the "ave" function in base R?

I am having some difficulty understanding the structure of the desired output. I initially thought it might be something like:
 rd.txt <-
function(txt, header=TRUE, ...) {
     rd <- read.table(textConnection(txt), header=header, ...)
       closeAllConnections()
     rd }
txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
         1           100 2008    39
         1           200 2009    48
         2           100 2008    64
         2           200 2009    73
         3           100 2008    35
         3           200 2009    35
         4           100 2008    52
         4           200 2009    61
         5           100 2008    51
         5           200 2009    58
         6           300 2008    45
         6           400 2009    55
         7           300 2008    69
         7           400 2009    77
         8           300 2008    47
         8           400 2009    47
         9           300 2008    57
         9           400 2009    58
        10           300 2008    47
        10           400 2009    53")
dtxt <- data.table(txt)

> dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))] # returns a vector as long as its input
> dtxt

But now I am wondering if you wanted:

> dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)] # returns vector only as long as product of category levels.
   1    2    3    4    5    6    7    8    9   10
43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0




     STUDENT_ID SCHOOL_NUMBER YEAR SCORE
[1,]          1           100 2008    39
[2,]          1           200 2009    48
[3,]          2           100 2008    64
[4,]          2           200 2009    73
[5,]          3           100 2008    35
[6,]          3           200 2009    35
[7,]          4           100 2008    52
[8,]          4           200 2009    61
[9,]          5           100 2008    51
[10,]          5           200 2009    58
[11,]          6           300 2008    45
[12,]          6           400 2009    55
[13,]          7           300 2008    69
[14,]          7           400 2009    77
[15,]          8           300 2008    47
[16,]          8           400 2009    47
[17,]          9           300 2008    57
[18,]          9           400 2009    58
[19,]         10           300 2008    47
[20,]         10           400 2009    53


Thanks,

Damian



David Winsemius, MD
West Hartford, CT

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to