Re: [R] How can I get predict.lm results with manual calculations ? (a floating point problem)

2009-09-14 Thread Tony Plate

Results can be slightly different when matrix algebra routines are called.  
Here's your example again.  When the prediction is computed directly using 
matrix multiplication, the result is the same as 'predict' produces (at least 
in this case.)


set.seed(1)
n - 100
x - rnorm(n)
y - rnorm(n)
aa - lm(y ~ x)
all.equal(as.numeric(predict(aa, new)), as.numeric(aa$coef[1] + aa$coef[2] * 
new$x), tol=0)

[1] Mean relative difference: 1.840916e-16

all.equal(as.numeric(predict(aa, new)), as.numeric(cbind(1, new$x) %*% 
aa$coef), tol=0)

[1] TRUE




These types of small differences are often not indicative of lower precision in 
one method, but rather just random floating-point inaccuracies that can depend 
on things like the order numbers are summed in (e.g., ((bigNegNum + bigPosNum) 
+ smallPosNum) will often be slightly different to ((bigPosNum + smallPosNum) + 
bigNegNum).  They can also depend on whether intermediate results are kept in 
CPU registers, which sometimes have higher precision than 64 bits.  Usually, 
they're nothing to worry about, which is one of the major reasons that 
all.equal() has a non-zero default for the tol= argument.

-- Tony Plate

Tal Galili wrote:

Hello dear r-help group

I am turning for you for help with FAQ number 7.31: Why doesn't R think
these numbers are equal?
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


*My story* is this:
I wish to run many lm predictions and need to have them run fast.
Using predict.lm is relatively slow, so I tried having it run faster by
doing the prediction calculations manually.
But doing that gave me problematic results (I won't go into the details of
how I found that out).

I then discovered that the problem was that the manual calculations I used
for the lm predictions yielded different results than that of predict.lm,

*here is an example*:

predict.lm.diff.from.manual.compute - function(sample.size = 100)
{
x - rnorm(sample.size)
y - x + rnorm(sample.size)
 new - data.frame(x = seq(-3, 3, length.out = sample.size))
aa - lm(y ~ x)

predict.lm.result - sum(predict(aa, new, se.fit = F))
manual.lm.compute.result - sum(aa$coeff[1]+ new * aa$coeff[2])

# manual.lm.compute.result == predict.lm.result
return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0))
}

# and here are the results of running the code several times:


predict.lm.diff.from.manual.compute(100)

[1] Mean relative difference: 1.046407e-15

predict.lm.diff.from.manual.compute(1000)

[1] Mean relative difference: 4.113951e-16

predict.lm.diff.from.manual.compute(1)

[1] Mean relative difference: 2.047455e-14

predict.lm.diff.from.manual.compute(10)

[1] Mean relative difference: 1.294251e-14

predict.lm.diff.from.manual.compute(100)

[1] Mean relative difference: 5.508314e-13



And that leaves me with *the question*:
Can I reproduce more accurate results from the manual calculations (as the
ones I might have gotten from predict.lm) ?
Maybe some parameter to increase the precision of the computation ?

Many thanks,
Tal











__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I get predict.lm results with manual calculations ? (a floating point problem)

2009-09-14 Thread Tal Galili
Hi Tony,

It appears you (and David) where right, and my own problem came from another
issue which wasn't the floating point. (the problem was misspelling of the
colname of the data.frame I used as the newdata in the predict.lm)

Yet I am still happy with my posting, only to get your informed reply -
thank you very much for the interesting info!

Best regards,
Tal






On Mon, Sep 14, 2009 at 8:31 PM, Tony Plate tpl...@acm.org wrote:

 Results can be slightly different when matrix algebra routines are called.
  Here's your example again.  When the prediction is computed directly using
 matrix multiplication, the result is the same as 'predict' produces (at
 least in this case.)

  set.seed(1)
 n - 100
 x - rnorm(n)
 y - rnorm(n)
 aa - lm(y ~ x)
 all.equal(as.numeric(predict(aa, new)), as.numeric(aa$coef[1] + aa$coef[2]
 * new$x), tol=0)

 [1] Mean relative difference: 1.840916e-16

 all.equal(as.numeric(predict(aa, new)), as.numeric(cbind(1, new$x) %*%
 aa$coef), tol=0)

 [1] TRUE



 These types of small differences are often not indicative of lower
 precision in one method, but rather just random floating-point inaccuracies
 that can depend on things like the order numbers are summed in (e.g.,
 ((bigNegNum + bigPosNum) + smallPosNum) will often be slightly different to
 ((bigPosNum + smallPosNum) + bigNegNum).  They can also depend on whether
 intermediate results are kept in CPU registers, which sometimes have higher
 precision than 64 bits.  Usually, they're nothing to worry about, which is
 one of the major reasons that all.equal() has a non-zero default for the
 tol= argument.

 -- Tony Plate


 Tal Galili wrote:

 Hello dear r-help group

 I am turning for you for help with FAQ number 7.31: Why doesn't R think
 these numbers are equal?

 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


 *My story* is this:
 I wish to run many lm predictions and need to have them run fast.
 Using predict.lm is relatively slow, so I tried having it run faster by
 doing the prediction calculations manually.
 But doing that gave me problematic results (I won't go into the details of
 how I found that out).

 I then discovered that the problem was that the manual calculations I used
 for the lm predictions yielded different results than that of predict.lm,

 *here is an example*:

 predict.lm.diff.from.manual.compute - function(sample.size = 100)
 {
 x - rnorm(sample.size)
 y - x + rnorm(sample.size)
  new - data.frame(x = seq(-3, 3, length.out = sample.size))
 aa - lm(y ~ x)

 predict.lm.result - sum(predict(aa, new, se.fit = F))
 manual.lm.compute.result - sum(aa$coeff[1]+ new * aa$coeff[2])

 # manual.lm.compute.result == predict.lm.result
 return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0))
 }

 # and here are the results of running the code several times:

  predict.lm.diff.from.manual.compute(100)

 [1] Mean relative difference: 1.046407e-15

 predict.lm.diff.from.manual.compute(1000)

 [1] Mean relative difference: 4.113951e-16

 predict.lm.diff.from.manual.compute(1)

 [1] Mean relative difference: 2.047455e-14

 predict.lm.diff.from.manual.compute(10)

 [1] Mean relative difference: 1.294251e-14

 predict.lm.diff.from.manual.compute(100)

 [1] Mean relative difference: 5.508314e-13



 And that leaves me with *the question*:
 Can I reproduce more accurate results from the manual calculations (as the
 ones I might have gotten from predict.lm) ?
 Maybe some parameter to increase the precision of the computation ?

 Many thanks,
 Tal













-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I get predict.lm results with manual calculations ? (a floating point problem)

2009-09-13 Thread Tal Galili
Hello dear r-help group

I am turning for you for help with FAQ number 7.31: Why doesn't R think
these numbers are equal?
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


*My story* is this:
I wish to run many lm predictions and need to have them run fast.
Using predict.lm is relatively slow, so I tried having it run faster by
doing the prediction calculations manually.
But doing that gave me problematic results (I won't go into the details of
how I found that out).

I then discovered that the problem was that the manual calculations I used
for the lm predictions yielded different results than that of predict.lm,

*here is an example*:

predict.lm.diff.from.manual.compute - function(sample.size = 100)
{
x - rnorm(sample.size)
y - x + rnorm(sample.size)
 new - data.frame(x = seq(-3, 3, length.out = sample.size))
aa - lm(y ~ x)

predict.lm.result - sum(predict(aa, new, se.fit = F))
manual.lm.compute.result - sum(aa$coeff[1]+ new * aa$coeff[2])

# manual.lm.compute.result == predict.lm.result
return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0))
}

# and here are the results of running the code several times:

 predict.lm.diff.from.manual.compute(100)
[1] Mean relative difference: 1.046407e-15
 predict.lm.diff.from.manual.compute(1000)
[1] Mean relative difference: 4.113951e-16
 predict.lm.diff.from.manual.compute(1)
[1] Mean relative difference: 2.047455e-14
 predict.lm.diff.from.manual.compute(10)
[1] Mean relative difference: 1.294251e-14
 predict.lm.diff.from.manual.compute(100)
[1] Mean relative difference: 5.508314e-13




And that leaves me with *the question*:
Can I reproduce more accurate results from the manual calculations (as the
ones I might have gotten from predict.lm) ?
Maybe some parameter to increase the precision of the computation ?

Many thanks,
Tal









-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I get predict.lm results with manual calculations ? (a floating point problem)

2009-09-13 Thread David Winsemius


On Sep 13, 2009, at 1:19 PM, Tal Galili wrote:


Hello dear r-help group

I am turning for you for help with FAQ number 7.31: Why doesn't R  
think

these numbers are equal?
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f


*My story* is this:
I wish to run many lm predictions and need to have them run fast.
Using predict.lm is relatively slow, so I tried having it run faster  
by

doing the prediction calculations manually.
But doing that gave me problematic results (I won't go into the  
details of

how I found that out).

I then discovered that the problem was that the manual calculations  
I used
for the lm predictions yielded different results than that of  
predict.lm,


*here is an example*:

predict.lm.diff.from.manual.compute - function(sample.size = 100)
{
x - rnorm(sample.size)
y - x + rnorm(sample.size)
new - data.frame(x = seq(-3, 3, length.out = sample.size))
aa - lm(y ~ x)

predict.lm.result - sum(predict(aa, new, se.fit = F))
manual.lm.compute.result - sum(aa$coeff[1]+ new * aa$coeff[2])

# manual.lm.compute.result == predict.lm.result
return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0))
}

# and here are the results of running the code several times:


predict.lm.diff.from.manual.compute(100)

[1] Mean relative difference: 1.046407e-15

predict.lm.diff.from.manual.compute(1000)

[1] Mean relative difference: 4.113951e-16

predict.lm.diff.from.manual.compute(1)

[1] Mean relative difference: 2.047455e-14

predict.lm.diff.from.manual.compute(10)

[1] Mean relative difference: 1.294251e-14

predict.lm.diff.from.manual.compute(100)

[1] Mean relative difference: 5.508314e-13


Are these meaningfully different? You are using different datasets and  
didn't even set a seed for your RNG. They seem really quite close. Why  
should we care?


 .Machine$double.eps
[1] 2.220446e-16




And that leaves me with *the question*:
Can I reproduce more accurate results from the manual calculations  
(as the

ones I might have gotten from predict.lm) ?
Maybe some parameter to increase the precision of the computation ?

Many thanks,
Tal


--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.