Re: [R] Using by() and stacking back sub-data frames to one data frame

2009-06-25 Thread jim holtman
One thing you might consider when working with large dataframes is that
instead of partitioning the dataframe into smaller ones, create a list of
indices and use that to access the subset.  Works especially well when using
'lapply' to cromp through many segments of a data frame:

 y
  suid month esr
1  107403412   6
2  1074034 1   2
3  1074034 2   2
4  1074034 3   2
5  107403412   1
6  1074034 1   1
7  1074034 2   1
8  1074034 3   1
9  107403412   2
10 1074034 1   2
11 1074034 2   2
12 1074034 3   2
13 107403412   9
14 1074034 1   9
15 1074034 2   9
16 1074034 3   9
17 112300312   2
18 1123003 1   2
19 1123003 2   2
20 1123003 3   2
 y.ind - split(seq(nrow(y)), y$month)
 y.ind
$`1`
[1]  2  6 10 14 18
$`2`
[1]  3  7 11 15 19
$`3`
[1]  4  8 12 16 20
$`12`
[1]  1  5  9 13 17
 # a subset
 y[y.ind[['12']],]
  suid month esr
1  107403412   6
5  107403412   1
9  107403412   2
13 107403412   9
17 112300312   2



On Wed, Jun 24, 2009 at 11:34 PM, Stephan Lindner lindn...@umich.eduwrote:

 Dear all,


 I have a code where I subset a data frame to match entries within
 levels of an factor (actually, the full script uses three difference
 factors do do that). I'm very happy with the precision with which I can
 work with R, but since I loop over factor levels, and the data frame is
 big, the process is slow. So I've been trying to speed up the process
 using by(), but I got stuck at the point where I want to stack back
 the sub- data frames, and I was wondering whether someone could help me
 out.

 Here is an example:

 --

  y - data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
 month = rep(c(12,1,2,3),5),
 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))


  by(y,y$month,function(x)return(x))

 y$month: 1
  suid month esr
 2  1074034 1   2
 6  1074034 1   1
 10 1074034 1   2
 14 1074034 1   9
 18 1123003 1   2
 
 y$month: 2
  suid month esr
 3  1074034 2   2
 7  1074034 2   1
 11 1074034 2   2
 15 1074034 2   9
 19 1123003 2   2
 
 y$month: 3
  suid month esr
 4  1074034 3   2
 8  1074034 3   1
 12 1074034 3   2
 16 1074034 3   9
 20 1123003 3   2
 
 y$month: 12
  suid month esr
 1  107403412   6
 5  107403412   1
 9  107403412   2
 13 107403412   9
 17 112300312   2

 --

 What I would like to do is stacking these four data frames back to one
 data frame, which in this simple example would just be y. I tried
 unlist(), unclass() and rbind(), but none of them would work.


 Thanks a lot,



Stephan










 --
 ---
 Stephan Lindner
 University of Michigan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using by() and stacking back sub-data frames to one data frame

2009-06-25 Thread hadley wickham
Have a look at ddply from the plyr package, http://had.co.nz/plyr.
It's made for exactly this type of operation.

Hadley

On Wed, Jun 24, 2009 at 10:34 PM, Stephan Lindnerlindn...@umich.edu wrote:
 Dear all,


 I have a code where I subset a data frame to match entries within
 levels of an factor (actually, the full script uses three difference
 factors do do that). I'm very happy with the precision with which I can
 work with R, but since I loop over factor levels, and the data frame is
 big, the process is slow. So I've been trying to speed up the process
 using by(), but I got stuck at the point where I want to stack back
 the sub- data frames, and I was wondering whether someone could help me
 out.

 Here is an example:

 --

 y - data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
                 month = rep(c(12,1,2,3),5),
                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))


 by(y,y$month,function(x)return(x))

 y$month: 1
      suid month esr
 2  1074034     1   2
 6  1074034     1   1
 10 1074034     1   2
 14 1074034     1   9
 18 1123003     1   2
 
 y$month: 2
      suid month esr
 3  1074034     2   2
 7  1074034     2   1
 11 1074034     2   2
 15 1074034     2   9
 19 1123003     2   2
 
 y$month: 3
      suid month esr
 4  1074034     3   2
 8  1074034     3   1
 12 1074034     3   2
 16 1074034     3   9
 20 1123003     3   2
 
 y$month: 12
      suid month esr
 1  1074034    12   6
 5  1074034    12   1
 9  1074034    12   2
 13 1074034    12   9
 17 1123003    12   2

 --

 What I would like to do is stacking these four data frames back to one
 data frame, which in this simple example would just be y. I tried
 unlist(), unclass() and rbind(), but none of them would work.


 Thanks a lot,



        Stephan










 --
 ---
 Stephan Lindner
 University of Michigan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using by() and stacking back sub-data frames to one data frame

2009-06-25 Thread David Winsemius
Your request for a more general approach is precisely the reason that  
Hadley Wickham wrote the plyr package. He describes a split-apply- 
combine strategy for a variety of data structures and tools to  
implement those strategies here:


http://had.co.nz/plyr/plyr-intro-090510.pdf

The argument to the by stp is a column name rather than a list or  
object as it would be in tapply or split. I is just the identity  
function which doubles for return(x) in your code.


library(plyr)
 ddply(y, month, fun=I)
  suid month esr
1  1074034 1   2
2  1074034 1   1
3  1074034 1   2
4  1074034 1   9
5  1123003 1   2
6  1074034 2   2
7  1074034 2   1
8  1074034 2   2
9  1074034 2   9
10 1123003 2   2
11 1074034 3   2
12 1074034 3   1
13 1074034 3   2
14 1074034 3   9
15 1123003 3   2
16 107403412   6
17 107403412   1
18 107403412   2
19 107403412   9
20 112300312   2

On Jun 24, 2009, at 11:34 PM, Stephan Lindner wrote:


Dear all,


I have a code where I subset a data frame to match entries within
levels of an factor (actually, the full script uses three difference
factors do do that). I'm very happy with the precision with which I  
can
work with R, but since I loop over factor levels, and the data frame  
is

big, the process is slow. So I've been trying to speed up the process
using by(), but I got stuck at the point where I want to stack back
the sub- data frames, and I was wondering whether someone could help  
me

out.

Here is an example:

--


y - data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),

month = rep(c(12,1,2,3),5),
esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))



by(y,y$month,function(x)return(x))


y$month: 1
 suid month esr
2  1074034 1   2
6  1074034 1   1
10 1074034 1   2
14 1074034 1   9
18 1123003 1   2

y$month: 2
 suid month esr
3  1074034 2   2
7  1074034 2   1
11 1074034 2   2
15 1074034 2   9
19 1123003 2   2

y$month: 3
 suid month esr
4  1074034 3   2
8  1074034 3   1
12 1074034 3   2
16 1074034 3   9
20 1123003 3   2

y$month: 12
 suid month esr
1  107403412   6
5  107403412   1
9  107403412   2
13 107403412   9
17 112300312   2

--

What I would like to do is stacking these four data frames back to one
data frame, which in this simple example would just be y. I tried
unlist(), unclass() and rbind(), but none of them would work.


Thanks a lot,



Stephan










--
---
Stephan Lindner
University of Michigan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using by() and stacking back sub-data frames to one data frame

2009-06-24 Thread Stephan Lindner
Dear all,


I have a code where I subset a data frame to match entries within
levels of an factor (actually, the full script uses three difference
factors do do that). I'm very happy with the precision with which I can
work with R, but since I loop over factor levels, and the data frame is
big, the process is slow. So I've been trying to speed up the process
using by(), but I got stuck at the point where I want to stack back
the sub- data frames, and I was wondering whether someone could help me
out. 

Here is an example:

-- 

 y - data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
 month = rep(c(12,1,2,3),5),
 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))


 by(y,y$month,function(x)return(x))

y$month: 1
  suid month esr
2  1074034 1   2
6  1074034 1   1
10 1074034 1   2
14 1074034 1   9
18 1123003 1   2
 
y$month: 2
  suid month esr
3  1074034 2   2
7  1074034 2   1
11 1074034 2   2
15 1074034 2   9
19 1123003 2   2
 
y$month: 3
  suid month esr
4  1074034 3   2
8  1074034 3   1
12 1074034 3   2
16 1074034 3   9
20 1123003 3   2
 
y$month: 12
  suid month esr
1  107403412   6
5  107403412   1
9  107403412   2
13 107403412   9
17 112300312   2

-- 

What I would like to do is stacking these four data frames back to one
data frame, which in this simple example would just be y. I tried
unlist(), unclass() and rbind(), but none of them would work. 


Thanks a lot,



Stephan










-- 
---
Stephan Lindner
University of Michigan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using by() and stacking back sub-data frames to one data frame

2009-06-24 Thread Kingsford Jones
try

do.call(rbind, yourByList)


hth,
Kingsford Jones

On Wed, Jun 24, 2009 at 9:34 PM, Stephan Lindnerlindn...@umich.edu wrote:
 Dear all,


 I have a code where I subset a data frame to match entries within
 levels of an factor (actually, the full script uses three difference
 factors do do that). I'm very happy with the precision with which I can
 work with R, but since I loop over factor levels, and the data frame is
 big, the process is slow. So I've been trying to speed up the process
 using by(), but I got stuck at the point where I want to stack back
 the sub- data frames, and I was wondering whether someone could help me
 out.

 Here is an example:

 --

 y - data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
                 month = rep(c(12,1,2,3),5),
                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))


 by(y,y$month,function(x)return(x))

 y$month: 1
      suid month esr
 2  1074034     1   2
 6  1074034     1   1
 10 1074034     1   2
 14 1074034     1   9
 18 1123003     1   2
 
 y$month: 2
      suid month esr
 3  1074034     2   2
 7  1074034     2   1
 11 1074034     2   2
 15 1074034     2   9
 19 1123003     2   2
 
 y$month: 3
      suid month esr
 4  1074034     3   2
 8  1074034     3   1
 12 1074034     3   2
 16 1074034     3   9
 20 1123003     3   2
 
 y$month: 12
      suid month esr
 1  1074034    12   6
 5  1074034    12   1
 9  1074034    12   2
 13 1074034    12   9
 17 1123003    12   2

 --

 What I would like to do is stacking these four data frames back to one
 data frame, which in this simple example would just be y. I tried
 unlist(), unclass() and rbind(), but none of them would work.


 Thanks a lot,



        Stephan










 --
 ---
 Stephan Lindner
 University of Michigan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.