[R] Expand a contingency table based on the value in one column
Hi R-helpers, I have the following (dummy) dataframe: test DATE LOCATION KIND CLASS COUNT 111 CAR A 2 211 TRUCK D 3 311 BUS E 4 412 CAR E 2 512 TRUCK A 7 612 BUS F 1 That I would like to turn into this: test2 DATE LOCATION KIND CLASS 1 11 CAR A 2 11 CAR A 3 11 TRUCK D 4 11 TRUCK D 5 11 TRUCK D 6 11 BUS E 7 11 BUS E 8 11 BUS E 9 11 BUS E 1012 CAR E 1112 CAR E 1212 TRUCK A 1312 TRUCK A 1412 TRUCK A 1512 TRUCK A 1612 TRUCK A 1712 TRUCK A 1812 TRUCK A 1912 BUS F So, basically it's a case of expanding (adding rows to) the first dataframe by the value in the COUNT column. I have solved this problem with the following code: test2-with(test, data.frame(DATE=rep(DATE,COUNT), LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT))) but I'm unsatisfied with that solution because it's verbose and I think there must a more elegant way. If I had more variables than 4 (which I do in my real data) it would be a nuisance to repeat each column within the rep function. I would prefer to do this with Base R or package(reshape) than relying on another package. Any ideas? Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expand a contingency table based on the value in one column
On Jun 11, 2009, at 1:13 PM, Mark Na wrote: Hi R-helpers, I have the following (dummy) dataframe: test DATE LOCATION KIND CLASS COUNT 111 CAR A 2 211 TRUCK D 3 311 BUS E 4 412 CAR E 2 512 TRUCK A 7 612 BUS F 1 That I would like to turn into this: test2 DATE LOCATION KIND CLASS 1 11 CAR A 2 11 CAR A 3 11 TRUCK D 4 11 TRUCK D 5 11 TRUCK D 6 11 BUS E 7 11 BUS E 8 11 BUS E 9 11 BUS E 1012 CAR E 1112 CAR E 1212 TRUCK A 1312 TRUCK A 1412 TRUCK A 1512 TRUCK A 1612 TRUCK A 1712 TRUCK A 1812 TRUCK A 1912 BUS F test2 - test[rep(1:nrow(test), test$COUNT),] rownames(test2) - 1:nrow(test2) test2 So, basically it's a case of expanding (adding rows to) the first dataframe by the value in the COUNT column. I have solved this problem with the following code: test2-with(test, data.frame(DATE=rep(DATE,COUNT), LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT))) but I'm unsatisfied with that solution because it's verbose and I think there must a more elegant way. If I had more variables than 4 (which I do in my real data) it would be a nuisance to repeat each column within the rep function. I would prefer to do this with Base R or package(reshape) than relying on another package. Any ideas? Thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expand a contingency table based on the value in one column
On Jun 11, 2009, at 12:13 PM, Mark Na wrote: Hi R-helpers, I have the following (dummy) dataframe: test DATE LOCATION KIND CLASS COUNT 111 CAR A 2 211 TRUCK D 3 311 BUS E 4 412 CAR E 2 512 TRUCK A 7 612 BUS F 1 That I would like to turn into this: test2 DATE LOCATION KIND CLASS 1 11 CAR A 2 11 CAR A 3 11 TRUCK D 4 11 TRUCK D 5 11 TRUCK D 6 11 BUS E 7 11 BUS E 8 11 BUS E 9 11 BUS E 1012 CAR E 1112 CAR E 1212 TRUCK A 1312 TRUCK A 1412 TRUCK A 1512 TRUCK A 1612 TRUCK A 1712 TRUCK A 1812 TRUCK A 1912 BUS F So, basically it's a case of expanding (adding rows to) the first dataframe by the value in the COUNT column. I have solved this problem with the following code: test2-with(test, data.frame(DATE=rep(DATE,COUNT), LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT))) but I'm unsatisfied with that solution because it's verbose and I think there must a more elegant way. If I had more variables than 4 (which I do in my real data) it would be a nuisance to repeat each column within the rep function. I would prefer to do this with Base R or package(reshape) than relying on another package. Any ideas? Thanks! Mark Na Mark, A quick and dirty solution: test[rep(1:nrow(test), test$COUNT), -ncol(test)] DATE LOCATION KIND CLASS 1 11 CAR A 1.111 CAR A 2 11 TRUCK D 2.111 TRUCK D 2.211 TRUCK D 3 11 BUS E 3.111 BUS E 3.211 BUS E 3.311 BUS E 4 12 CAR E 4.112 CAR E 5 12 TRUCK A 5.112 TRUCK A 5.212 TRUCK A 5.312 TRUCK A 5.412 TRUCK A 5.512 TRUCK A 5.612 TRUCK A 6 12 BUS F For a more general solution to taking a tabulated data frame and converting it back to the raw data see my expand.dft() function: https://stat.ethz.ch/pipermail/r-help/2009-January/185561.html For example: expand.dft(test, freq = COUNT) DATE LOCATION KIND CLASS 1 11 CAR A 2 11 CAR A 3 11 TRUCK D 4 11 TRUCK D 5 11 TRUCK D 6 11 BUS E 7 11 BUS E 8 11 BUS E 9 11 BUS E 1012 CAR E 1112 CAR E 1212 TRUCK A 1312 TRUCK A 1412 TRUCK A 1512 TRUCK A 1612 TRUCK A 1712 TRUCK A 1812 TRUCK A 1912 BUS F HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.