Re: [R] substitute NA values

2007-03-31 Thread Jim Lemon
Sergio Della Franca wrote:
 Dear R-Helpers,
 
 
 I have the following data set(y):
 
   Test_Result   #_Test
 t 10
 f 14
 f 25
 f NA
 f 40
 t45
 t44
  NA   47
 tNA
 
 
 I want to replace the NA values with the following method:
 - for the numeric variable, replace NA with median
 - for character variable , replace NA with the most frequent level
 
 If i use x-na.roughfix(y) the NA values are correctly replaced.
 But if i x-na.roughfix(y$Test_Result) i obtain the following error:
 
 roughfix can only deal with numeric data.
 
 How can i solve this proble that i met every time i want to replace only the
 NA values of a column (type character)?
 
Hi Sergio,
In the prettyR package is the Mode function. This returns the mode of a 
vector as a character string. So I think this would do what you want:

library(prettyR)
testvec-c(sample(LETTERS[1:4],20,TRUE),NA,NA)
testvec[is.na(testvec)]-Mode(testvec)

You could do the same trick with the median function for the numeric values.

Jim

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] substitute NA values

2007-03-30 Thread Sergio Della Franca
Dear R-Helpers,


I have the following data set(y):

  Test_Result   #_Test
t 10
f 14
f 25
f NA
f 40
t45
t44
 NA   47
tNA


I want to replace the NA values with the following method:
- for the numeric variable, replace NA with median
- for character variable , replace NA with the most frequent level

If i use x-na.roughfix(y) the NA values are correctly replaced.
But if i x-na.roughfix(y$Test_Result) i obtain the following error:

roughfix can only deal with numeric data.

How can i solve this proble that i met every time i want to replace only the
NA values of a column (type character)?

Thank you in advance.


Sergio Della Franca

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-30 Thread Gabor Grothendieck
I assume you are referring to na.roughfix in randomForest.  I don't think it
works for logical vectors or for factors outside of data frames:

 library(randomForest)
 DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5))
 na.roughfix(DF)
Error in na.roughfix.data.frame(DF) : na.roughfix only works for
numeric or factor
 DF$a - factor(DF$a)
 na.roughfix(DF$a)
Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric data.
 na.roughfix(DF)
  a   b
1  TRUE 1.0
2 FALSE 2.0
3  TRUE 3.0
4  TRUE 2.5
5  TRUE 5.0


On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
 Dear R-Helpers,


 I have the following data set(y):

  Test_Result   #_Test
t 10
f 14
f 25
f NA
f 40
t45
t44
  NA   47
tNA


 I want to replace the NA values with the following method:
 - for the numeric variable, replace NA with median
 - for character variable , replace NA with the most frequent level

 If i use x-na.roughfix(y) the NA values are correctly replaced.
 But if i x-na.roughfix(y$Test_Result) i obtain the following error:

 roughfix can only deal with numeric data.

 How can i solve this proble that i met every time i want to replace only the
 NA values of a column (type character)?

 Thank you in advance.


 Sergio Della Franca

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-30 Thread Sergio Della Franca
This is that i obtained.

There isn't a method to replace the NA values only for character variable?






2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]:

 I assume you are referring to na.roughfix in randomForest.  I don't think
 it
 works for logical vectors or for factors outside of data frames:

  library(randomForest)
  DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5))
  na.roughfix(DF)
 Error in na.roughfix.data.frame(DF) : na.roughfix only works for
 numeric or factor
  DF$a - factor(DF$a)
  na.roughfix(DF$a)
 Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric
 data.
  na.roughfix(DF)
  a   b
 1  TRUE 1.0
 2 FALSE 2.0
 3  TRUE 3.0
 4  TRUE 2.5
 5  TRUE 5.0


 On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
  Dear R-Helpers,
 
 
  I have the following data set(y):
 
   Test_Result   #_Test
 t 10
 f 14
 f 25
 f NA
 f 40
 t45
 t44
   NA   47
 tNA
 
 
  I want to replace the NA values with the following method:
  - for the numeric variable, replace NA with median
  - for character variable , replace NA with the most frequent level
 
  If i use x-na.roughfix(y) the NA values are correctly replaced.
  But if i x-na.roughfix(y$Test_Result) i obtain the following error:
 
  roughfix can only deal with numeric data.
 
  How can i solve this proble that i met every time i want to replace only
 the
  NA values of a column (type character)?
 
  Thank you in advance.
 
 
  Sergio Della Franca
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-30 Thread Gabor Grothendieck
Not as part of na.roughfix.

You could convert your character strings to factors
and back again:

library(randomForest)
DF - data.frame(a = c(T, F, T, NA, T),
  b = c(1:3, NA, 5),
  c = c(b, b, NA, d, e),
  d = factor(c(a, a, NA, d, e)),
  stringsAsFactors = FALSE)
DF$a - factor(DF$a)
DF$c - factor(DF$c)
DF - na.roughfix(DF)
DF$c - as.character(DF$c)
DF




On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
 This is that i obtained.

 There isn't a method to replace the NA values only for character variable?






 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]:
  I assume you are referring to na.roughfix in randomForest.  I don't think
 it
  works for logical vectors or for factors outside of data frames:
 
   library(randomForest)
   DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5))
   na.roughfix(DF)
  Error in na.roughfix.data.frame(DF) : na.roughfix only works for
  numeric or factor
   DF$a - factor(DF$a)
   na.roughfix(DF$a)
  Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric
 data.
   na.roughfix(DF)
   a   b
  1  TRUE 1.0
  2 FALSE 2.0
  3  TRUE 3.0
  4  TRUE 2.5
  5  TRUE 5.0
 
 
  On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
   Dear R-Helpers,
  
  
   I have the following data set(y):
  
Test_Result   #_Test
  t 10
  f 14
  f 25
  f NA
  f 40
  t45
  t44
NA   47
  tNA
  
  
   I want to replace the NA values with the following method:
   - for the numeric variable, replace NA with median
   - for character variable , replace NA with the most frequent level
  
   If i use x-na.roughfix(y) the NA values are correctly replaced.
   But if i x-na.roughfix(y$Test_Result) i obtain the following error:
  
   roughfix can only deal with numeric data.
  
   How can i solve this proble that i met every time i want to replace only
 the
   NA values of a column (type character)?
  
   Thank you in advance.
  
  
   Sergio Della Franca
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-30 Thread Gavin Simpson
On Fri, 2007-03-30 at 16:25 +0200, Sergio Della Franca wrote:
 This is that i obtained.
 
 There isn't a method to replace the NA values only for character variable?

This is R, there is always a way (paraphrasing an R-Helper the name of
whom I forget just now). If you mean a canned function, not that I'm
aware of.

Here is one way:

## some example data - not exactly like yours
set.seed(1234)
dat - data.frame(test = sample(c(t,f), 9, replace = TRUE), 
  num = c(10,14,25,NA,40,45,44,47,NA))

## add an NA to dat$test to match your example
dat$test[8] - NA

## print out dat
dat

## count the various options in $test and return the name of
## the most frequent
freq - names(which.max(table(dat$test)))

## replace NA in $test with most frequent
dat$test[is.na(dat$test)] - freq

## print out dat again to show this worked
dat

There may be better ways - the names(which.max(table(...))) seems a bit
clunky to me but it is Friday afternoon and it's been a long week...

And, as this /is/ R, you could wrap that into a function for you use on
other data sets, but I'll leave that bit up to you.

HTH

G

 
 2007/3/30, Gabor Grothendieck [EMAIL PROTECTED]:
 
  I assume you are referring to na.roughfix in randomForest.  I don't think
  it
  works for logical vectors or for factors outside of data frames:
 
   library(randomForest)
   DF - data.frame(a = c(T, F, T, NA, T), b = c(1:3, NA, 5))
   na.roughfix(DF)
  Error in na.roughfix.data.frame(DF) : na.roughfix only works for
  numeric or factor
   DF$a - factor(DF$a)
   na.roughfix(DF$a)
  Error in na.roughfix.default(DF$a) : roughfix can only deal with numeric
  data.
   na.roughfix(DF)
   a   b
  1  TRUE 1.0
  2 FALSE 2.0
  3  TRUE 3.0
  4  TRUE 2.5
  5  TRUE 5.0
 
 
  On 3/30/07, Sergio Della Franca [EMAIL PROTECTED] wrote:
   Dear R-Helpers,
  
  
   I have the following data set(y):
  
Test_Result   #_Test
  t 10
  f 14
  f 25
  f NA
  f 40
  t45
  t44
NA   47
  tNA
  
  
   I want to replace the NA values with the following method:
   - for the numeric variable, replace NA with median
   - for character variable , replace NA with the most frequent level
  
   If i use x-na.roughfix(y) the NA values are correctly replaced.
   But if i x-na.roughfix(y$Test_Result) i obtain the following error:
  
   roughfix can only deal with numeric data.
  
   How can i solve this proble that i met every time i want to replace only
  the
   NA values of a column (type character)?
  
   Thank you in advance.
  
  
   Sergio Della Franca
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-30 Thread Charilaos Skiadas
On Mar 30, 2007, at 10:56 AM, Gavin Simpson wrote:

 This is R, there is always a way (paraphrasing an R-Helper the name of
 whom I forget just now).

Can't resist, it's one of my favorite fortunes ;)

That would be Simon 'Yoda' Blomberg:

library(fortunes)
fortune(109)

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] substitute NA values

2007-03-28 Thread Sergio Della Franca
Dearl R-Helpers,


I have the following data set:

YEAR   PRODUCTS
1990 2478
1995 3192
2000 NA
2005 1594


I wanto to replace NA values, in the PRODUCTS column, with 0.


How can i obtain this?


Thak you in advance.


Sergio Della Franca

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-28 Thread Marc Schwartz
On Wed, 2007-03-28 at 16:21 +0200, Sergio Della Franca wrote:
 Dearl R-Helpers,
 
 
 I have the following data set:
 
 YEAR   PRODUCTS
 1990 2478
 1995 3192
 2000 NA
 2005 1594
 
 
 I wanto to replace NA values, in the PRODUCTS column, with 0.
 
 
 How can i obtain this?
 
 
 Thak you in advance.
 
 
 Sergio Della Franca

Several ways:

1. Using replace():

  DF$PRODUCTS - replace(DF$PRODUCTS, is.na(DF$PRODUCTS), 0)


2. Using regular indexing:

 DF$PRODUCTS[is.na(DF$PRODUCTS)] - 0


See ?replace and ?is.na

That being said, be very cautious about doing this. Most R functions are
designed to handle NA values in very predictable ways, but not so with 0
values.  See ?NA for more information.

For example:

 DF
  YEAR PRODUCTS
1 1990 2478
2 1995 3192
3 2000   NA
4 2005 1594

 mean(DF$PRODUCTS)
[1] NA

 mean(DF$PRODUCTS, na.rm = TRUE)
[1] 2421.333


Now with:

 DF
  YEAR PRODUCTS
1 1990 2478
2 1995 3192
3 20000
4 2005 1594


 mean(DF$PRODUCTS)
[1] 1816


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-28 Thread Uwe Ligges
See ?is.na and use its result for indexing.

Uwe Ligges

Sergio Della Franca wrote:
 Dearl R-Helpers,
 
 
 I have the following data set:
 
 YEAR   PRODUCTS
 1990 2478
 1995 3192
 2000 NA
 2005 1594
 
 
 I wanto to replace NA values, in the PRODUCTS column, with 0.
 
 
 How can i obtain this?
 
 
 Thak you in advance.
 
 
 Sergio Della Franca
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substitute NA values

2007-03-28 Thread Jorge Cornejo-Donoso
This could work, but not with big matrix!

year - c(1990,1995,2000,2005)
Prod - c(2478,3192,NA,1594)

matrix - data.frame(cbind(year,Prod))
for (i in 1:dim(matrix)[1])
{
if (is.na(matrix[i,2])) {matrix[i,2] - 0}
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.