Re: [R] substitute NA values
Sergio Della Franca wrote: Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Hi Sergio, In the prettyR package is the Mode function. This returns the mode of a vector as a character string. So I think this would do what you want: library(prettyR) testvec-c(sample(LETTERS[1:4],20,TRUE),NA,NA) testvec[is.na(testvec)]-Mode(testvec) You could do the same trick with the median function for the numeric values. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] substitute NA values
Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
I assume you are referring to na.roughfix in randomForest. I don't think it works for logical vectors or for factors outside of data frames: library(randomForest) DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5)) na.roughfix(DF) Error in na.roughfix.data.frame(DF) : na.roughfix only works for numeric or factor DF$a - factor(DF$a) na.roughfix(DF$a) Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric data. na.roughfix(DF) a b 1 TRUE 1.0 2 FALSE 2.0 3 TRUE 3.0 4 TRUE 2.5 5 TRUE 5.0 On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote: Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
This is that i obtained. There isn't a method to replace the NA values only for character variable? 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]: I assume you are referring to na.roughfix in randomForest. I don't think it works for logical vectors or for factors outside of data frames: library(randomForest) DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5)) na.roughfix(DF) Error in na.roughfix.data.frame(DF) : na.roughfix only works for numeric or factor DF$a - factor(DF$a) na.roughfix(DF$a) Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric data. na.roughfix(DF) a b 1 TRUE 1.0 2 FALSE 2.0 3 TRUE 3.0 4 TRUE 2.5 5 TRUE 5.0 On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote: Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
Not as part of na.roughfix. You could convert your character strings to factors and back again: library(randomForest) DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5), c = c(b, b, NA, d, e), d = factor(c(a, a, NA, d, e)), stringsAsFactors = FALSE) DF$a - factor(DF$a) DF$c - factor(DF$c) DF - na.roughfix(DF) DF$c - as.character(DF$c) DF On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote: This is that i obtained. There isn't a method to replace the NA values only for character variable? 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]: I assume you are referring to na.roughfix in randomForest. I don't think it works for logical vectors or for factors outside of data frames: library(randomForest) DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5)) na.roughfix(DF) Error in na.roughfix.data.frame(DF) : na.roughfix only works for numeric or factor DF$a - factor(DF$a) na.roughfix(DF$a) Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric data. na.roughfix(DF) a b 1 TRUE 1.0 2 FALSE 2.0 3 TRUE 3.0 4 TRUE 2.5 5 TRUE 5.0 On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote: Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
On Fri, 2007-03-30 at 16:25 +0200, Sergio Della Franca wrote: This is that i obtained. There isn't a method to replace the NA values only for character variable? This is R, there is always a way (paraphrasing an R-Helper the name of whom I forget just now). If you mean a canned function, not that I'm aware of. Here is one way: ## some example data - not exactly like yours set.seed(1234) dat - data.frame(test = sample(c(t,f), 9, replace = TRUE), num = c(10,14,25,NA,40,45,44,47,NA)) ## add an NA to dat$test to match your example dat$test[8] - NA ## print out dat dat ## count the various options in $test and return the name of ## the most frequent freq - names(which.max(table(dat$test))) ## replace NA in $test with most frequent dat$test[is.na(dat$test)] - freq ## print out dat again to show this worked dat There may be better ways - the names(which.max(table(...))) seems a bit clunky to me but it is Friday afternoon and it's been a long week... And, as this /is/ R, you could wrap that into a function for you use on other data sets, but I'll leave that bit up to you. HTH G 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]: I assume you are referring to na.roughfix in randomForest. I don't think it works for logical vectors or for factors outside of data frames: library(randomForest) DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5)) na.roughfix(DF) Error in na.roughfix.data.frame(DF) : na.roughfix only works for numeric or factor DF$a - factor(DF$a) na.roughfix(DF$a) Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric data. na.roughfix(DF) a b 1 TRUE 1.0 2 FALSE 2.0 3 TRUE 3.0 4 TRUE 2.5 5 TRUE 5.0 On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote: Dear R-Helpers, I have the following data set(y): Test_Result #_Test t 10 f 14 f 25 f NA f 40 t45 t44 NA 47 tNA I want to replace the NA values with the following method: - for the numeric variable, replace NA with median - for character variable , replace NA with the most frequent level If i use x-na.roughfix(y) the NA values are correctly replaced. But if i x-na.roughfix(y$Test_Result) i obtain the following error: roughfix can only deal with numeric data. How can i solve this proble that i met every time i want to replace only the NA values of a column (type character)? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
On Mar 30, 2007, at 10:56 AM, Gavin Simpson wrote: This is R, there is always a way (paraphrasing an R-Helper the name of whom I forget just now). Can't resist, it's one of my favorite fortunes ;) That would be Simon 'Yoda' Blomberg: library(fortunes) fortune(109) Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] substitute NA values
Dearl R-Helpers, I have the following data set: YEAR PRODUCTS 1990 2478 1995 3192 2000 NA 2005 1594 I wanto to replace NA values, in the PRODUCTS column, with 0. How can i obtain this? Thak you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
On Wed, 2007-03-28 at 16:21 +0200, Sergio Della Franca wrote: Dearl R-Helpers, I have the following data set: YEAR PRODUCTS 1990 2478 1995 3192 2000 NA 2005 1594 I wanto to replace NA values, in the PRODUCTS column, with 0. How can i obtain this? Thak you in advance. Sergio Della Franca Several ways: 1. Using replace(): DF$PRODUCTS - replace(DF$PRODUCTS, is.na(DF$PRODUCTS), 0) 2. Using regular indexing: DF$PRODUCTS[is.na(DF$PRODUCTS)] - 0 See ?replace and ?is.na That being said, be very cautious about doing this. Most R functions are designed to handle NA values in very predictable ways, but not so with 0 values. See ?NA for more information. For example: DF YEAR PRODUCTS 1 1990 2478 2 1995 3192 3 2000 NA 4 2005 1594 mean(DF$PRODUCTS) [1] NA mean(DF$PRODUCTS, na.rm = TRUE) [1] 2421.333 Now with: DF YEAR PRODUCTS 1 1990 2478 2 1995 3192 3 20000 4 2005 1594 mean(DF$PRODUCTS) [1] 1816 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
See ?is.na and use its result for indexing. Uwe Ligges Sergio Della Franca wrote: Dearl R-Helpers, I have the following data set: YEAR PRODUCTS 1990 2478 1995 3192 2000 NA 2005 1594 I wanto to replace NA values, in the PRODUCTS column, with 0. How can i obtain this? Thak you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substitute NA values
This could work, but not with big matrix! year - c(1990,1995,2000,2005) Prod - c(2478,3192,NA,1594) matrix - data.frame(cbind(year,Prod)) for (i in 1:dim(matrix)[1]) { if (is.na(matrix[i,2])) {matrix[i,2] - 0} } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.