I'm running into a problem I can't seem to find a solution for. I'm
attempting to add sequences into an existing data set based on subsets
of the data.  I've done this using a for loop with a small subset of
data, but attempting the same process using real data (200k rows) is
taking way too long.

 

Here is some sample data and my ultimate goal

> row1<-c(0,1,2,3,4,5,1,2,3,4)

> row2<-c(1,1,1,1,1,1,2,2,2,2)

> stuff<-data.frame(row1=row1,row2=row2)

> stuff

   row1 row2

1     0    1

2     1    1

3     2    1

4     3    1

5     4    1

6     5    1

7     1    2

8     2    2

9     3    2

10    4    2

 

 

I need to derive 2 columns. I need a sequence for each unique row2, and
then I need a sequence that restarts based on a cutoff value for row1
and unique row2. The following table is what is -should- look like using
a cutoff of 3 for row4

 

   row1 row2 row3 row4

1     0    1    1    1

2     1    1    2    2

3     2    1    3    3

4     3    1    4    1

5     4    1    5    2

6     5    1    6    3

7     1    2    1    1

8     2    2    2    2

9     3    2    3    1

10    4    2    4    2

 

I need something like row3<-sequence(nrow(unique(stuff$row2))) that
actually works :-) Here is the for loop that functions properly for
row3:

 

stuff$row3<-c(1)

for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
stuff$row3[i] = stuff$row3[i-1]+1}}

Thanks!

 

Jason Baucom

Ateb, Inc.

919.882.4992 O

919.872.1645 F

www.ateb.com <http://www.ateb.com/> 

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to