Hi, As an aside -- in the future, please provide some data in a form that we can just copy and paste from your email into an R session so that we can get a working object up quickly.
For example: R> dt <- data.table(coursecode=c(NA, NA, NA, 101, 102, 101, 102, 103), student_id=c(1, 1, 1, 1, 1, 2, 2, 2), key='student_id') On Thu, May 5, 2011 at 10:54 PM, Steve Harman <[email protected]> wrote: > Hello > > I have a data table called dt in which each student can have multiple > records (created using data.table) > > coursecode student_id > ---------------- ---------------- > NA 1 > NA 1 > NA 1 > .... 1 > .... 1 > NA 2 > 101 2 > 102 2 > NA 2 > 103 2 > > I am trying to group by student id and concatenate the coursecode > strings in > student records. This string is mostly NA but it can also be real > course code > (because of messy real life data coursecode was not always entered) > There are 999999 records. > > So, I thought I would get results like > > 1 NA NA NA ..... > 2 NA 101 102 NA 123 .... What type of object are you expecting that result to be? > However, as seen below, it brings me a result with 999999 rows > and it fails to concatenate the coursecode's. > >> codes <- dt[,paste(coursecode),by=student_id] >> codes > student_id V1 > [1,] 1 NA > [2,] 1 NA > [3,] 1 NA > [4,] 1 NA > [5,] 1 NA > [6,] 1 NA > [7,] 1 NA > [8,] 1 NA > [9,] 1 NA > [10,] 1 NA > First 10 rows of 999999 printed. > > If I repeat the same example for a numeric attribute and use some math > aggregation functions such as sum, mean, etc., then the number of rows > returned is correct, it is indeed equal to the number of students. > > I was wondering if the problem is with NA's or with the use of paste > as the aggregation function. I can alternatively use RMySQL with MySQL > to concatenate those strings but I would like to use data.table if > possible. What if you try this (using my `dt` example from above): R> dt[, paste(coursecode, collapse=","), by=student_id] student_id V1 [1,] 1 NA,NA,NA,101,102 [2,] 2 101,102,103 Note that each element in the $V1 column is a character vector of length 1 and not individual course codes. Without using the `collapse` argument to your call to paste, you just get a character vector which is the same length as you passed in, eg: R> paste(c('A', 'B', NA, 'C')) [1] "A" "B" "NA" "C" vs. R> paste(c('A', 'B', NA, 'C'), collapse=",") [1] "A,B,NA,C" HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
