I'm running into a problem I can't seem to find a solution for. I'm attempting to add sequences into an existing data set based on subsets of the data. I've done this using a for loop with a small subset of data, but attempting the same process using real data (200k rows) is taking way too long.
Here is some sample data and my ultimate goal > row1<-c(0,1,2,3,4,5,1,2,3,4) > row2<-c(1,1,1,1,1,1,2,2,2,2) > stuff<-data.frame(row1=row1,row2=row2) > stuff row1 row2 1 0 1 2 1 1 3 2 1 4 3 1 5 4 1 6 5 1 7 1 2 8 2 2 9 3 2 10 4 2 I need to derive 2 columns. I need a sequence for each unique row2, and then I need a sequence that restarts based on a cutoff value for row1 and unique row2. The following table is what is -should- look like using a cutoff of 3 for row4 row1 row2 row3 row4 1 0 1 1 1 2 1 1 2 2 3 2 1 3 3 4 3 1 4 1 5 4 1 5 2 6 5 1 6 3 7 1 2 1 1 8 2 2 2 2 9 3 2 3 1 10 4 2 4 2 I need something like row3<-sequence(nrow(unique(stuff$row2))) that actually works :-) Here is the for loop that functions properly for row3: stuff$row3<-c(1) for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) { stuff$row3[i] = stuff$row3[i-1]+1}} Thanks! Jason Baucom Ateb, Inc. 919.882.4992 O 919.872.1645 F www.ateb.com <http://www.ateb.com/> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.