[R] Replace NAs in one column with data from another column

2010-09-08 Thread Jakob Hedegaard
Hi list,

I have a data frame (m) with 169221 rows and 10 columns and would like to make 
a new column containing the content of column 3 but replace the NAs in column 3 
with the data in column 1 (from the same row as the NA in column 3). Column 1 
has data in all rows.

My first attempt was:

for (i in 1:169221){
if (is.na(m[i,3])==TRUE){
m[i,11] - as.character(m[i,1])}
else{
m[i,11] - as.character(m[i,3])}
}

Works - but takes too long time.
I would appreciate alternative solutions.

Best regards, Jakob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread Dimitris Rizopoulos

one way is the following:

m - data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
m$z[sample(100, 20)] - NA

m$z.new - ifelse(is.na(m$z), m$x, m$z)


I hope it helps.

Best,
Dimitris


On 9/8/2010 8:17 PM, Jakob Hedegaard wrote:

Hi list,

I have a data frame (m) with 169221 rows and 10 columns and would like to make 
a new column containing the content of column 3 but replace the NAs in column 3 
with the data in column 1 (from the same row as the NA in column 3). Column 1 
has data in all rows.

My first attempt was:

for (i in 1:169221){
if (is.na(m[i,3])==TRUE){
m[i,11]- as.character(m[i,1])}
else{
m[i,11]- as.character(m[i,3])}
}

Works - but takes too long time.
I would appreciate alternative solutions.

Best regards, Jakob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread jim holtman
?ifelse

df$newCol - ifelse(is.na(df$col3), df$col1, df$col3)

On Wed, Sep 8, 2010 at 2:17 PM, Jakob Hedegaard
jakob.hedega...@agrsci.dk wrote:
 Hi list,

 I have a data frame (m) with 169221 rows and 10 columns and would like to 
 make a new column containing the content of column 3 but replace the NAs in 
 column 3 with the data in column 1 (from the same row as the NA in column 3). 
 Column 1 has data in all rows.

 My first attempt was:

 for (i in 1:169221){
 if (is.na(m[i,3])==TRUE){
 m[i,11] - as.character(m[i,1])}
 else{
 m[i,11] - as.character(m[i,3])}
 }

 Works - but takes too long time.
 I would appreciate alternative solutions.

 Best regards, Jakob

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread Joshua Wiley
Hi Jakob,

You can use is.na() to create an index of which rows in column 3 are
missing data, and then select these from column 1.  Here is a simple
example:

dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
dat$new - dat$V3
my.na - is.na(dat$V3)
dat$new[my.na] - dat$V1[my.na]

dat

This should be quite fast.  I broke the steps up to be explicit, but
you can readily simplify them.

HTH,

Josh

On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
jakob.hedega...@agrsci.dk wrote:
 Hi list,

 I have a data frame (m) with 169221 rows and 10 columns and would like to 
 make a new column containing the content of column 3 but replace the NAs in 
 column 3 with the data in column 1 (from the same row as the NA in column 3). 
 Column 1 has data in all rows.

 My first attempt was:

 for (i in 1:169221){
 if (is.na(m[i,3])==TRUE){
 m[i,11] - as.character(m[i,1])}
 else{
 m[i,11] - as.character(m[i,3])}
 }

 Works - but takes too long time.
 I would appreciate alternative solutions.

 Best regards, Jakob

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread David Winsemius


On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:


Hi Jakob,

You can use is.na() to create an index of which rows in column 3 are
missing data, and then select these from column 1.  Here is a simple
example:

dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
dat$new - dat$V3
my.na - is.na(dat$V3)
dat$new[my.na] - dat$V1[my.na]

dat

This should be quite fast.  I broke the steps up to be explicit, but
you can readily simplify them.


I was about to post something similar except I was going to avoid the  
$ operator thinking, incorrectly as it turned out, that it would be  
faster. I also include the Holtman/Rizopoulos suggestion of ifelse().  
I was also surprised that ifelse is the winning strategy:


dat[4] - dat[3]; idx -is.na(dat[, 3])
dat[is.na(dat[, 3]), 4] - dat[is.na(dat[, 3]), 1]

 benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), dat$V1,  
dat$V3)},

+  meth.dlr.sign={dat$new - dat$V3
+  my.na - is.na(dat$V3)
+  dat$new[my.na] - dat$V1[my.na]},
+  meth.index ={dat[4] - dat[3]; idx -is.na(dat[, 3])
+  dat[idx, 4] - dat[idx, 1]},
+ meth.forloop ={for (i in 1:nrow(dat)){
+ if (is.na(dat[i,3])==TRUE){
+ dat[i,4]- dat[i,1]}
+ else{
+ dat[i,4]- dat[i,3]} }
+ },
+ replications=5000, columns = c(test, replications, elapsed,
+  relative, user.self) )
   test replications elapsed  relative user.self
2 meth.dlr.sign 5000   0.502  1.081897 0.501
4  meth.forloop 5000   6.419 13.834052 6.409
1   meth.ifelse 5000   0.464  1.00 0.463
3meth.index 5000   2.908  6.267241 2.904

--
David.


HTH,

Josh

On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
jakob.hedega...@agrsci.dk wrote:

Hi list,

I have a data frame (m) with 169221 rows and 10 columns and would  
like to make a new column containing the content of column 3 but  
replace the NAs in column 3 with the data in column 1 (from the  
same row as the NA in column 3). Column 1 has data in all rows.


My first attempt was:

for (i in 1:169221){
if (is.na(m[i,3])==TRUE){
m[i,11] - as.character(m[i,1])}
else{
m[i,11] - as.character(m[i,3])}
}

Works - but takes too long time.
I would appreciate alternative solutions.

Best regards, Jakob



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread Bert Gunter
with() would seem to be useful here:

m$z - with(m,ifelse(is.na(z), x, z))

(I believe the timing is similar, but haven't checked)

-- Bert

On Wed, Sep 8, 2010 at 11:22 AM, Dimitris Rizopoulos
d.rizopou...@erasmusmc.nl wrote:
 one way is the following:

 m - data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
 m$z[sample(100, 20)] - NA

 m$z.new - ifelse(is.na(m$z), m$x, m$z)


 I hope it helps.

 Best,
 Dimitris


 On 9/8/2010 8:17 PM, Jakob Hedegaard wrote:

 Hi list,

 I have a data frame (m) with 169221 rows and 10 columns and would like to
 make a new column containing the content of column 3 but replace the NAs in
 column 3 with the data in column 1 (from the same row as the NA in column
 3). Column 1 has data in all rows.

 My first attempt was:

 for (i in 1:169221){
 if (is.na(m[i,3])==TRUE){
 m[i,11]- as.character(m[i,1])}
 else{
 m[i,11]- as.character(m[i,3])}
 }

 Works - but takes too long time.
 I would appreciate alternative solutions.

 Best regards, Jakob

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Dimitris Rizopoulos
 Assistant Professor
 Department of Biostatistics
 Erasmus University Medical Center

 Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
 Tel: +31/(0)10/7043478
 Fax: +31/(0)10/7043014

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace NAs in one column with data from another column

2010-09-08 Thread Joshua Wiley
On Wed, Sep 8, 2010 at 12:02 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:

 Hi Jakob,

 You can use is.na() to create an index of which rows in column 3 are
 missing data, and then select these from column 1.  Here is a simple
 example:

 dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
 dat$new - dat$V3
 my.na - is.na(dat$V3)
 dat$new[my.na] - dat$V1[my.na]

 dat

 This should be quite fast.  I broke the steps up to be explicit, but
 you can readily simplify them.

 I was about to post something similar except I was going to avoid the $
 operator thinking, incorrectly as it turned out, that it would be faster. I
 also include the Holtman/Rizopoulos suggestion of ifelse(). I was also
 surprised that ifelse is the winning strategy:

That surprises me too.  What I find really curious is the (relatively)
large difference between the dlr.sign and index methods.  Some of the
difference is gained back if dat[, 4] - dat[, 3] is used over dat[4]
- dat[3].  But it still lags noticeably on my old clunker (with the
inventive name, index2) compared to dlr.sign:

# after failed attempts with benchmark::benchmark()
# I decided this is what you used
 library(rbenchmark)
 dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
 rbenchmark::benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), 
 dat$V1, dat$V3)},
+   meth.dlr.sign = {dat$new - dat$V3
+my.na - is.na(dat$V3)
+dat$new[my.na] - dat$V1[my.na]},
+   meth.index = {dat[4] - dat[3]; idx -is.na(dat[, 3])
+dat[idx, 4] - dat[idx, 1]},
+   meth.index2 = {dat[, 4] - dat[, 3]; idx -is.na(dat[, 3])
+dat[idx, 4] - dat[idx, 1]},
+   meth.forloop = {for (i in 1:nrow(dat)){
+ if(is.na(dat[i,2])==TRUE){
+   dat[i, 3] - dat[i, 1]
+ } else { dat[i,3] - dat[i,2]}}
+ },
+   replications=5000, columns = c(test, replications, elapsed,
+relative, user.self))
   test replications elapsed  relative user.self
2 meth.dlr.sign 5000   1.337  1.206679 1.216
5  meth.forloop 5000  16.941 15.28971114.997
1   meth.ifelse 5000   1.108  1.00 1.061
3meth.index 5000   8.868  8.003610 7.164
4   meth.index2 5000   6.099  5.504513 5.136



 dat[4] - dat[3]; idx -is.na(dat[, 3])
 dat[is.na(dat[, 3]), 4] - dat[is.na(dat[, 3]), 1]

 benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), dat$V1,
 dat$V3)},
 +  meth.dlr.sign={dat$new - dat$V3
 +  my.na - is.na(dat$V3)
 +  dat$new[my.na] - dat$V1[my.na]},
 +  meth.index ={dat[4] - dat[3]; idx -is.na(dat[, 3])
 +  dat[idx, 4] - dat[idx, 1]},
 + meth.forloop ={for (i in 1:nrow(dat)){
 + if (is.na(dat[i,3])==TRUE){
 + dat[i,4]- dat[i,1]}
 + else{
 + dat[i,4]- dat[i,3]} }
 + },
 + replications=5000, columns = c(test, replications, elapsed,
 +      relative, user.self) )
           test replications elapsed  relative user.self
 2 meth.dlr.sign         5000   0.502  1.081897     0.501
 4  meth.forloop         5000   6.419 13.834052     6.409
 1   meth.ifelse         5000   0.464  1.00     0.463
 3    meth.index         5000   2.908  6.267241     2.904

 --
 David.

 HTH,

 Josh

 On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
 jakob.hedega...@agrsci.dk wrote:

 Hi list,

 I have a data frame (m) with 169221 rows and 10 columns and would like to
 make a new column containing the content of column 3 but replace the NAs in
 column 3 with the data in column 1 (from the same row as the NA in column
 3). Column 1 has data in all rows.

 My first attempt was:

 for (i in 1:169221){
 if (is.na(m[i,3])==TRUE){
 m[i,11] - as.character(m[i,1])}
 else{
 m[i,11] - as.character(m[i,3])}
 }

 Works - but takes too long time.
 I would appreciate alternative solutions.

 Best regards, Jakob

 --

 David Winsemius, MD
 West Hartford, CT





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.