[R] Expand a contingency table based on the value in one column

2009-06-11 Thread Mark Na
Hi R-helpers,

I have the following (dummy) dataframe:

 test
  DATE LOCATION  KIND CLASS COUNT
111   CAR A 2
211 TRUCK D 3
311   BUS E 4
412   CAR E 2
512 TRUCK A 7
612   BUS F 1

That I would like to turn into this:

 test2
   DATE LOCATION  KIND CLASS
1 11   CAR A
2 11   CAR A
3 11 TRUCK D
4 11 TRUCK D
5 11 TRUCK D
6 11   BUS E
7 11   BUS E
8 11   BUS E
9 11   BUS E
1012   CAR E
1112   CAR E
1212 TRUCK A
1312 TRUCK A
1412 TRUCK A
1512 TRUCK A
1612 TRUCK A
1712 TRUCK A
1812 TRUCK A
1912   BUS F

So, basically it's a case of expanding (adding rows to) the first dataframe
by the value in the COUNT column.

I have solved this problem with the following code:

test2-with(test, data.frame(DATE=rep(DATE,COUNT),
LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT)))

but I'm unsatisfied with that solution because it's verbose and I think
there must a more elegant way. If I had more variables than 4 (which I do in
my real data) it would be a nuisance to repeat each column within the rep
function.

I would prefer to do this with Base R or package(reshape) than relying on
another package.

Any ideas? Thanks!

Mark Na

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Expand a contingency table based on the value in one column

2009-06-11 Thread David Winsemius


On Jun 11, 2009, at 1:13 PM, Mark Na wrote:


Hi R-helpers,

I have the following (dummy) dataframe:


test

 DATE LOCATION  KIND CLASS COUNT
111   CAR A 2
211 TRUCK D 3
311   BUS E 4
412   CAR E 2
512 TRUCK A 7
612   BUS F 1

That I would like to turn into this:


test2

  DATE LOCATION  KIND CLASS
1 11   CAR A
2 11   CAR A
3 11 TRUCK D
4 11 TRUCK D
5 11 TRUCK D
6 11   BUS E
7 11   BUS E
8 11   BUS E
9 11   BUS E
1012   CAR E
1112   CAR E
1212 TRUCK A
1312 TRUCK A
1412 TRUCK A
1512 TRUCK A
1612 TRUCK A
1712 TRUCK A
1812 TRUCK A
1912   BUS F



 test2 - test[rep(1:nrow(test), test$COUNT),]
 rownames(test2) - 1:nrow(test2)
 test2


So, basically it's a case of expanding (adding rows to) the first  
dataframe

by the value in the COUNT column.

I have solved this problem with the following code:

test2-with(test, data.frame(DATE=rep(DATE,COUNT),
LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT),  
CLASS=rep(CLASS,COUNT)))


but I'm unsatisfied with that solution because it's verbose and I  
think
there must a more elegant way. If I had more variables than 4 (which  
I do in
my real data) it would be a nuisance to repeat each column within  
the rep

function.

I would prefer to do this with Base R or package(reshape) than  
relying on

another package.

Any ideas? Thanks!

Mark Na

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Expand a contingency table based on the value in one column

2009-06-11 Thread Marc Schwartz


On Jun 11, 2009, at 12:13 PM, Mark Na wrote:


Hi R-helpers,

I have the following (dummy) dataframe:


test

 DATE LOCATION  KIND CLASS COUNT
111   CAR A 2
211 TRUCK D 3
311   BUS E 4
412   CAR E 2
512 TRUCK A 7
612   BUS F 1

That I would like to turn into this:


test2

  DATE LOCATION  KIND CLASS
1 11   CAR A
2 11   CAR A
3 11 TRUCK D
4 11 TRUCK D
5 11 TRUCK D
6 11   BUS E
7 11   BUS E
8 11   BUS E
9 11   BUS E
1012   CAR E
1112   CAR E
1212 TRUCK A
1312 TRUCK A
1412 TRUCK A
1512 TRUCK A
1612 TRUCK A
1712 TRUCK A
1812 TRUCK A
1912   BUS F

So, basically it's a case of expanding (adding rows to) the first  
dataframe

by the value in the COUNT column.

I have solved this problem with the following code:

test2-with(test, data.frame(DATE=rep(DATE,COUNT),
LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT),  
CLASS=rep(CLASS,COUNT)))


but I'm unsatisfied with that solution because it's verbose and I  
think
there must a more elegant way. If I had more variables than 4 (which  
I do in
my real data) it would be a nuisance to repeat each column within  
the rep

function.

I would prefer to do this with Base R or package(reshape) than  
relying on

another package.

Any ideas? Thanks!

Mark Na



Mark,

A quick and dirty solution:

 test[rep(1:nrow(test), test$COUNT), -ncol(test)]
DATE LOCATION  KIND CLASS
1  11   CAR A
1.111   CAR A
2  11 TRUCK D
2.111 TRUCK D
2.211 TRUCK D
3  11   BUS E
3.111   BUS E
3.211   BUS E
3.311   BUS E
4  12   CAR E
4.112   CAR E
5  12 TRUCK A
5.112 TRUCK A
5.212 TRUCK A
5.312 TRUCK A
5.412 TRUCK A
5.512 TRUCK A
5.612 TRUCK A
6  12   BUS F


For a more general solution to taking a tabulated data frame and  
converting it back to the raw data see my expand.dft() function:


  https://stat.ethz.ch/pipermail/r-help/2009-January/185561.html

For example:

 expand.dft(test, freq = COUNT)
   DATE LOCATION  KIND CLASS
1 11   CAR A
2 11   CAR A
3 11 TRUCK D
4 11 TRUCK D
5 11 TRUCK D
6 11   BUS E
7 11   BUS E
8 11   BUS E
9 11   BUS E
1012   CAR E
1112   CAR E
1212 TRUCK A
1312 TRUCK A
1412 TRUCK A
1512 TRUCK A
1612 TRUCK A
1712 TRUCK A
1812 TRUCK A
1912   BUS F



HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.