Re: [datatable-help] Data table syntax

David Winsemius Sun, 05 Sep 2010 16:12:08 -0700


On Sep 5, 2010, at 6:58 PM, Damian Betebenner wrote:

David,
Thanks!!! Your previous example, which I'm just playing with nowseems to work. Given that the file I'm working with has, usually,over a million cases, I want to make sure it works on somethingsmaller before setting it to work on the bigger table. I'll tryoptimizing later.

The problem with my first solution is that there is no method toassociate or test for the prior or next school attended. The secondmethod should allow looking at the cross-tabulation of 2008 and 2009schools if you constructed a more realistic test table and also used aby= argument that had both years in the list().


--
David.


Thanks a TON,

Damian

Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351

Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

[email protected]
www.nciea.org




-----Original Message-----
From: David Winsemius [mailto:[email protected]]
Sent: Sunday, September 05, 2010 6:56 PM
To: [email protected]
Cc: Damian Betebenner; David Winsemius
Subject: Re: [datatable-help] Data table syntax

I went back and reviewed the merge.data.frame method and then
(re-)reviewed the data.table documents. I was disappointed to find the
there were relatively few worked examples of data.table merges, but
finally got something that seems to make sense:

dftxt <- as.data.frame(dtxt)
# The dataframe method

merge(dftxt[dftxt$YEAR==2008, c(1,2,4)], dftxt[dftxt$YEAR==2009,

c(1,2,4)], by="STUDENT_ID")
   STUDENT_ID SCHOOL_NUMBER.x SCORE.x SCHOOL_NUMBER.y SCORE.y
1           1             100      39             200      48
2           2             100      64             200      73
3           3             100      35             200      35
4           4             100      52             200      61
5           5             100      51             200      58
6           6             300      45             400      55
7           7             300      69             400      77
8           8             300      47             400      47
9           9             300      57             400      58
10         10             300      47             400      53

#The datatable method:

setkey(dtxt) <- "STUDENT_ID"   #as far as I can tell this is needed

to repalce the by= argument in merge.data.frame()

#( I tried several failed efforts at incorporating the key= statement
within the data.table calls.)

merge(dtxt[YEAR==2008,  ], dtxt[YEAR==2009,  ])

      STUDENT_ID SCHOOL_NUMBER YEAR SCORE SCHOOL_NUMBER.1 YEAR.1
SCORE.1
 [1,]          1           100 2008    39             200   2009
48
 [2,]          2           100 2008    64             200   2009
73
 [3,]          3           100 2008    35             200   2009
35
 [4,]          4           100 2008    52             200   2009
61
 [5,]          5           100 2008    51             200   2009
58
 [6,]          6           300 2008    45             400   2009
55
 [7,]          7           300 2008    69             400   2009
77
 [8,]          8           300 2008    47             400   2009
47
 [9,]          9           300 2008    57             400   2009
58

[10,] 10 300 2008 47 4002009 53

mtxt <- merge(dtxt[YEAR==2008,  ], dtxt[YEAR==2009,  ])

mtxt[, mean(SCORE) , by=list("2009_SCH"=SCHOOL_NUMBER.1)]

     X2009_SCH   V1
[1,]       200 48.2
[2,]       400 53.0


On Sep 5, 2010, at 4:43 PM, David Winsemius wrote:


On Sep 5, 2010, at 3:02 PM, Damian Betebenner wrote:

Hi David,

Thanks for the quick and thoughtful reply.

Sorry for not being clearer.

There are two schools that the students attended in 2009 (200 and
400). I'd like to break on those, and calculate the mean for all
the students in those two schools but for their 2008 scores.

Thus, the output would have 2 rows:

YEAR, 2009_SCHOOL_NUMBER, 2008_SCORE_MEAN

2009  200    54.4
2009  400    53


I dont see what sort of linkage you have between the 2008 and 2009
school-numbers but see if this satisfies:

dtxt[YEAR==2008 & STUDENT_ID %in% dtxt[YEAR==2009, STUDENT_ID],

mean(SCORE), by=SCHOOL_NUMBER]
   SCHOOL_NUMBER   V1
[1,]           100 48.2
[2,]           300 53.0


Thanks for considering this,

Damian


Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351

Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

[email protected]
www.nciea.org




-----Original Message-----
From: David Winsemius [mailto:[email protected]]
Sent: Sunday, September 05, 2010 1:03 PM
To: David Winsemius
Cc: Damian Betebenner; [email protected]
Subject: Re: [datatable-help] Data table syntax


On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:


On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:

Thanks for the invaluable help on my previous questions. The speed

up in create summary tables has been immense and I'm enthusedabout

all the possibilities going forward.

I'm currently stuck in trying to put together syntax for a "long"
for table. In the example below, each case is a unique Student by
Year combination. What I'm trying to do is take
such a table, aggregate on the student's  current year (i.e., 2009
in this data) SCHOOL_NUMBER, and calculate their mean score in the
previous year (i.e., 2008 in this data).

If the file were "wide", with each case representing a unique

student with separate variables for the year, then it would beeasy

to break on the 2009 SCHOOL_NUMBER and take the
mean of the 2008 SCORE.


But there is only one 2008 SCORE for each student???


Is conversion of long to wide necessary to do this?


Probably not. Are you familiar with the "ave" function in base R?


I am having some difficulty understanding the structure of the
desired
output. I initially thought it might be something like:
rd.txt <-
function(txt, header=TRUE, ...) {
   rd <- read.table(textConnection(txt), header=header, ...)
     closeAllConnections()
   rd }
txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
       1           100 2008    39
       1           200 2009    48
       2           100 2008    64
       2           200 2009    73
       3           100 2008    35
       3           200 2009    35
       4           100 2008    52
       4           200 2009    61
       5           100 2008    51
       5           200 2009    58
       6           300 2008    45
       6           400 2009    55
       7           300 2008    69
       7           400 2009    77
       8           300 2008    47
       8           400 2009    47
       9           300 2008    57
       9           400 2009    58
      10           300 2008    47
      10           400 2009    53")
dtxt <- data.table(txt)

dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))]  # returns a

vector as long as its input

dtxt


But now I am wondering if you wanted:

dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)]  # returns vector

only as long as product of category levels.
 1    2    3    4    5    6    7    8    9   10
43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0



  STUDENT_ID SCHOOL_NUMBER YEAR SCORE
[1,]          1           100 2008    39
[2,]          1           200 2009    48
[3,]          2           100 2008    64
[4,]          2           200 2009    73
[5,]          3           100 2008    35
[6,]          3           200 2009    35
[7,]          4           100 2008    52
[8,]          4           200 2009    61
[9,]          5           100 2008    51
[10,]          5           200 2009    58
[11,]          6           300 2008    45
[12,]          6           400 2009    55
[13,]          7           300 2008    69
[14,]          7           400 2009    77
[15,]          8           300 2008    47
[16,]          8           400 2009    47
[17,]          9           300 2008    57
[18,]          9           400 2009    58
[19,]         10           300 2008    47
[20,]         10           400 2009    53


Thanks,

Damian



David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Data table syntax

Reply via email to