[R] Replace NAs in one column with data from another column
Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] - as.character(m[i,1])} else{ m[i,11] - as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
one way is the following: m - data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100)) m$z[sample(100, 20)] - NA m$z.new - ifelse(is.na(m$z), m$x, m$z) I hope it helps. Best, Dimitris On 9/8/2010 8:17 PM, Jakob Hedegaard wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11]- as.character(m[i,1])} else{ m[i,11]- as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
?ifelse df$newCol - ifelse(is.na(df$col3), df$col1, df$col3) On Wed, Sep 8, 2010 at 2:17 PM, Jakob Hedegaard jakob.hedega...@agrsci.dk wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] - as.character(m[i,1])} else{ m[i,11] - as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
Hi Jakob, You can use is.na() to create an index of which rows in column 3 are missing data, and then select these from column 1. Here is a simple example: dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) dat$new - dat$V3 my.na - is.na(dat$V3) dat$new[my.na] - dat$V1[my.na] dat This should be quite fast. I broke the steps up to be explicit, but you can readily simplify them. HTH, Josh On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard jakob.hedega...@agrsci.dk wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] - as.character(m[i,1])} else{ m[i,11] - as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote: Hi Jakob, You can use is.na() to create an index of which rows in column 3 are missing data, and then select these from column 1. Here is a simple example: dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) dat$new - dat$V3 my.na - is.na(dat$V3) dat$new[my.na] - dat$V1[my.na] dat This should be quite fast. I broke the steps up to be explicit, but you can readily simplify them. I was about to post something similar except I was going to avoid the $ operator thinking, incorrectly as it turned out, that it would be faster. I also include the Holtman/Rizopoulos suggestion of ifelse(). I was also surprised that ifelse is the winning strategy: dat[4] - dat[3]; idx -is.na(dat[, 3]) dat[is.na(dat[, 3]), 4] - dat[is.na(dat[, 3]), 1] benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), dat$V1, dat$V3)}, + meth.dlr.sign={dat$new - dat$V3 + my.na - is.na(dat$V3) + dat$new[my.na] - dat$V1[my.na]}, + meth.index ={dat[4] - dat[3]; idx -is.na(dat[, 3]) + dat[idx, 4] - dat[idx, 1]}, + meth.forloop ={for (i in 1:nrow(dat)){ + if (is.na(dat[i,3])==TRUE){ + dat[i,4]- dat[i,1]} + else{ + dat[i,4]- dat[i,3]} } + }, + replications=5000, columns = c(test, replications, elapsed, + relative, user.self) ) test replications elapsed relative user.self 2 meth.dlr.sign 5000 0.502 1.081897 0.501 4 meth.forloop 5000 6.419 13.834052 6.409 1 meth.ifelse 5000 0.464 1.00 0.463 3meth.index 5000 2.908 6.267241 2.904 -- David. HTH, Josh On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard jakob.hedega...@agrsci.dk wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] - as.character(m[i,1])} else{ m[i,11] - as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
with() would seem to be useful here: m$z - with(m,ifelse(is.na(z), x, z)) (I believe the timing is similar, but haven't checked) -- Bert On Wed, Sep 8, 2010 at 11:22 AM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: one way is the following: m - data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100)) m$z[sample(100, 20)] - NA m$z.new - ifelse(is.na(m$z), m$x, m$z) I hope it helps. Best, Dimitris On 9/8/2010 8:17 PM, Jakob Hedegaard wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11]- as.character(m[i,1])} else{ m[i,11]- as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replace NAs in one column with data from another column
On Wed, Sep 8, 2010 at 12:02 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote: Hi Jakob, You can use is.na() to create an index of which rows in column 3 are missing data, and then select these from column 1. Here is a simple example: dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) dat$new - dat$V3 my.na - is.na(dat$V3) dat$new[my.na] - dat$V1[my.na] dat This should be quite fast. I broke the steps up to be explicit, but you can readily simplify them. I was about to post something similar except I was going to avoid the $ operator thinking, incorrectly as it turned out, that it would be faster. I also include the Holtman/Rizopoulos suggestion of ifelse(). I was also surprised that ifelse is the winning strategy: That surprises me too. What I find really curious is the (relatively) large difference between the dlr.sign and index methods. Some of the difference is gained back if dat[, 4] - dat[, 3] is used over dat[4] - dat[3]. But it still lags noticeably on my old clunker (with the inventive name, index2) compared to dlr.sign: # after failed attempts with benchmark::benchmark() # I decided this is what you used library(rbenchmark) dat - data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) rbenchmark::benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), dat$V1, dat$V3)}, + meth.dlr.sign = {dat$new - dat$V3 +my.na - is.na(dat$V3) +dat$new[my.na] - dat$V1[my.na]}, + meth.index = {dat[4] - dat[3]; idx -is.na(dat[, 3]) +dat[idx, 4] - dat[idx, 1]}, + meth.index2 = {dat[, 4] - dat[, 3]; idx -is.na(dat[, 3]) +dat[idx, 4] - dat[idx, 1]}, + meth.forloop = {for (i in 1:nrow(dat)){ + if(is.na(dat[i,2])==TRUE){ + dat[i, 3] - dat[i, 1] + } else { dat[i,3] - dat[i,2]}} + }, + replications=5000, columns = c(test, replications, elapsed, +relative, user.self)) test replications elapsed relative user.self 2 meth.dlr.sign 5000 1.337 1.206679 1.216 5 meth.forloop 5000 16.941 15.28971114.997 1 meth.ifelse 5000 1.108 1.00 1.061 3meth.index 5000 8.868 8.003610 7.164 4 meth.index2 5000 6.099 5.504513 5.136 dat[4] - dat[3]; idx -is.na(dat[, 3]) dat[is.na(dat[, 3]), 4] - dat[is.na(dat[, 3]), 1] benchmark(meth.ifelse = {dat$z.new - ifelse(is.na(dat$V3), dat$V1, dat$V3)}, + meth.dlr.sign={dat$new - dat$V3 + my.na - is.na(dat$V3) + dat$new[my.na] - dat$V1[my.na]}, + meth.index ={dat[4] - dat[3]; idx -is.na(dat[, 3]) + dat[idx, 4] - dat[idx, 1]}, + meth.forloop ={for (i in 1:nrow(dat)){ + if (is.na(dat[i,3])==TRUE){ + dat[i,4]- dat[i,1]} + else{ + dat[i,4]- dat[i,3]} } + }, + replications=5000, columns = c(test, replications, elapsed, + relative, user.self) ) test replications elapsed relative user.self 2 meth.dlr.sign 5000 0.502 1.081897 0.501 4 meth.forloop 5000 6.419 13.834052 6.409 1 meth.ifelse 5000 0.464 1.00 0.463 3 meth.index 5000 2.908 6.267241 2.904 -- David. HTH, Josh On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard jakob.hedega...@agrsci.dk wrote: Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] - as.character(m[i,1])} else{ m[i,11] - as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob -- David Winsemius, MD West Hartford, CT -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.