Re: [R] Using by() and stacking back sub-data frames to one data frame
One thing you might consider when working with large dataframes is that instead of partitioning the dataframe into smaller ones, create a list of indices and use that to access the subset. Works especially well when using 'lapply' to cromp through many segments of a data frame: y suid month esr 1 107403412 6 2 1074034 1 2 3 1074034 2 2 4 1074034 3 2 5 107403412 1 6 1074034 1 1 7 1074034 2 1 8 1074034 3 1 9 107403412 2 10 1074034 1 2 11 1074034 2 2 12 1074034 3 2 13 107403412 9 14 1074034 1 9 15 1074034 2 9 16 1074034 3 9 17 112300312 2 18 1123003 1 2 19 1123003 2 2 20 1123003 3 2 y.ind - split(seq(nrow(y)), y$month) y.ind $`1` [1] 2 6 10 14 18 $`2` [1] 3 7 11 15 19 $`3` [1] 4 8 12 16 20 $`12` [1] 1 5 9 13 17 # a subset y[y.ind[['12']],] suid month esr 1 107403412 6 5 107403412 1 9 107403412 2 13 107403412 9 17 112300312 2 On Wed, Jun 24, 2009 at 11:34 PM, Stephan Lindner lindn...@umich.eduwrote: Dear all, I have a code where I subset a data frame to match entries within levels of an factor (actually, the full script uses three difference factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is big, the process is slow. So I've been trying to speed up the process using by(), but I got stuck at the point where I want to stack back the sub- data frames, and I was wondering whether someone could help me out. Here is an example: -- y - data.frame(suid = c(rep(1074034,16),rep(1123003,4)), month = rep(c(12,1,2,3),5), esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) by(y,y$month,function(x)return(x)) y$month: 1 suid month esr 2 1074034 1 2 6 1074034 1 1 10 1074034 1 2 14 1074034 1 9 18 1123003 1 2 y$month: 2 suid month esr 3 1074034 2 2 7 1074034 2 1 11 1074034 2 2 15 1074034 2 9 19 1123003 2 2 y$month: 3 suid month esr 4 1074034 3 2 8 1074034 3 1 12 1074034 3 2 16 1074034 3 9 20 1123003 3 2 y$month: 12 suid month esr 1 107403412 6 5 107403412 1 9 107403412 2 13 107403412 9 17 112300312 2 -- What I would like to do is stacking these four data frames back to one data frame, which in this simple example would just be y. I tried unlist(), unclass() and rbind(), but none of them would work. Thanks a lot, Stephan -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using by() and stacking back sub-data frames to one data frame
Have a look at ddply from the plyr package, http://had.co.nz/plyr. It's made for exactly this type of operation. Hadley On Wed, Jun 24, 2009 at 10:34 PM, Stephan Lindnerlindn...@umich.edu wrote: Dear all, I have a code where I subset a data frame to match entries within levels of an factor (actually, the full script uses three difference factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is big, the process is slow. So I've been trying to speed up the process using by(), but I got stuck at the point where I want to stack back the sub- data frames, and I was wondering whether someone could help me out. Here is an example: -- y - data.frame(suid = c(rep(1074034,16),rep(1123003,4)), month = rep(c(12,1,2,3),5), esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) by(y,y$month,function(x)return(x)) y$month: 1 suid month esr 2 1074034 1 2 6 1074034 1 1 10 1074034 1 2 14 1074034 1 9 18 1123003 1 2 y$month: 2 suid month esr 3 1074034 2 2 7 1074034 2 1 11 1074034 2 2 15 1074034 2 9 19 1123003 2 2 y$month: 3 suid month esr 4 1074034 3 2 8 1074034 3 1 12 1074034 3 2 16 1074034 3 9 20 1123003 3 2 y$month: 12 suid month esr 1 1074034 12 6 5 1074034 12 1 9 1074034 12 2 13 1074034 12 9 17 1123003 12 2 -- What I would like to do is stacking these four data frames back to one data frame, which in this simple example would just be y. I tried unlist(), unclass() and rbind(), but none of them would work. Thanks a lot, Stephan -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using by() and stacking back sub-data frames to one data frame
Your request for a more general approach is precisely the reason that Hadley Wickham wrote the plyr package. He describes a split-apply- combine strategy for a variety of data structures and tools to implement those strategies here: http://had.co.nz/plyr/plyr-intro-090510.pdf The argument to the by stp is a column name rather than a list or object as it would be in tapply or split. I is just the identity function which doubles for return(x) in your code. library(plyr) ddply(y, month, fun=I) suid month esr 1 1074034 1 2 2 1074034 1 1 3 1074034 1 2 4 1074034 1 9 5 1123003 1 2 6 1074034 2 2 7 1074034 2 1 8 1074034 2 2 9 1074034 2 9 10 1123003 2 2 11 1074034 3 2 12 1074034 3 1 13 1074034 3 2 14 1074034 3 9 15 1123003 3 2 16 107403412 6 17 107403412 1 18 107403412 2 19 107403412 9 20 112300312 2 On Jun 24, 2009, at 11:34 PM, Stephan Lindner wrote: Dear all, I have a code where I subset a data frame to match entries within levels of an factor (actually, the full script uses three difference factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is big, the process is slow. So I've been trying to speed up the process using by(), but I got stuck at the point where I want to stack back the sub- data frames, and I was wondering whether someone could help me out. Here is an example: -- y - data.frame(suid = c(rep(1074034,16),rep(1123003,4)), month = rep(c(12,1,2,3),5), esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) by(y,y$month,function(x)return(x)) y$month: 1 suid month esr 2 1074034 1 2 6 1074034 1 1 10 1074034 1 2 14 1074034 1 9 18 1123003 1 2 y$month: 2 suid month esr 3 1074034 2 2 7 1074034 2 1 11 1074034 2 2 15 1074034 2 9 19 1123003 2 2 y$month: 3 suid month esr 4 1074034 3 2 8 1074034 3 1 12 1074034 3 2 16 1074034 3 9 20 1123003 3 2 y$month: 12 suid month esr 1 107403412 6 5 107403412 1 9 107403412 2 13 107403412 9 17 112300312 2 -- What I would like to do is stacking these four data frames back to one data frame, which in this simple example would just be y. I tried unlist(), unclass() and rbind(), but none of them would work. Thanks a lot, Stephan -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using by() and stacking back sub-data frames to one data frame
Dear all, I have a code where I subset a data frame to match entries within levels of an factor (actually, the full script uses three difference factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is big, the process is slow. So I've been trying to speed up the process using by(), but I got stuck at the point where I want to stack back the sub- data frames, and I was wondering whether someone could help me out. Here is an example: -- y - data.frame(suid = c(rep(1074034,16),rep(1123003,4)), month = rep(c(12,1,2,3),5), esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) by(y,y$month,function(x)return(x)) y$month: 1 suid month esr 2 1074034 1 2 6 1074034 1 1 10 1074034 1 2 14 1074034 1 9 18 1123003 1 2 y$month: 2 suid month esr 3 1074034 2 2 7 1074034 2 1 11 1074034 2 2 15 1074034 2 9 19 1123003 2 2 y$month: 3 suid month esr 4 1074034 3 2 8 1074034 3 1 12 1074034 3 2 16 1074034 3 9 20 1123003 3 2 y$month: 12 suid month esr 1 107403412 6 5 107403412 1 9 107403412 2 13 107403412 9 17 112300312 2 -- What I would like to do is stacking these four data frames back to one data frame, which in this simple example would just be y. I tried unlist(), unclass() and rbind(), but none of them would work. Thanks a lot, Stephan -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using by() and stacking back sub-data frames to one data frame
try do.call(rbind, yourByList) hth, Kingsford Jones On Wed, Jun 24, 2009 at 9:34 PM, Stephan Lindnerlindn...@umich.edu wrote: Dear all, I have a code where I subset a data frame to match entries within levels of an factor (actually, the full script uses three difference factors do do that). I'm very happy with the precision with which I can work with R, but since I loop over factor levels, and the data frame is big, the process is slow. So I've been trying to speed up the process using by(), but I got stuck at the point where I want to stack back the sub- data frames, and I was wondering whether someone could help me out. Here is an example: -- y - data.frame(suid = c(rep(1074034,16),rep(1123003,4)), month = rep(c(12,1,2,3),5), esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) by(y,y$month,function(x)return(x)) y$month: 1 suid month esr 2 1074034 1 2 6 1074034 1 1 10 1074034 1 2 14 1074034 1 9 18 1123003 1 2 y$month: 2 suid month esr 3 1074034 2 2 7 1074034 2 1 11 1074034 2 2 15 1074034 2 9 19 1123003 2 2 y$month: 3 suid month esr 4 1074034 3 2 8 1074034 3 1 12 1074034 3 2 16 1074034 3 9 20 1123003 3 2 y$month: 12 suid month esr 1 1074034 12 6 5 1074034 12 1 9 1074034 12 2 13 1074034 12 9 17 1123003 12 2 -- What I would like to do is stacking these four data frames back to one data frame, which in this simple example would just be y. I tried unlist(), unclass() and rbind(), but none of them would work. Thanks a lot, Stephan -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.