Re: [R] Rép : ANOVA and permutation tests : beware of traps

Duncan Murdoch Wed, 24 Sep 2014 05:28:01 -0700

On 24/09/2014 8:00 AM, Stéphane Adamowicz wrote:

Many thanks to J. Wiley and B. Ripley for their quick answers.
I suppose that many end users are aware of problems in calculation accuracy 
with computers. However, I would like to comment that it was not that obvious 
for me that the data order matters. First, I do not find any clear mention of 
this particular problem in the == help page, but perhaps I am experiencing 
difficulties with the English. Second, I do not encounter this problem neither 
with the piece of code I proposed to replace the dubious one, or in the 
following experiments :

I suspect that there would be situations where your code failed aswell. Now that I know about this potential error, I would not trustcode that doesn't defend against it, even if I could not find an examplewhere it failed: it depends on tests of exact equality in floatingpoint values, and that is known to be a source of errors in numericalmethods.


> # experiment 1 : comparing total variances
> var(dat1$Y) == var(dat2$Y)
[1] TRUE
> # experiment 2 : comparing bilateral T tests
> abs(t.test(Y~F, dat1)$statistic) == abs(t.test(Y~F, dat2)$statistic)
    t
TRUE
> # experiment 3 : applying permutations to T tests
> nperm <- 10000
> T <- abs(t.test(Y~F, dat1)$statistic)
> Tperm <- replicate(n=nperm, abs(t.test(sample(Y)~F, dat1)$statistic))
[1] 0.1018898   # that's nice !

Thus, why a naive end user as I am should expect such pitfalls with F values 
given by the lm function ? Furthermore, codes similar to the one I criticized 
can be found in the teaching documents of various Universities and thus are 
spreading out. I would not be surprised that some scientific papers already 
rely on it ...

In fact, even in R web pages, under Books ("This page gives a partially annotated list of books that are 
related to S or R and may be useful to the R user community"), I found only one book clearly devoted to 
randomization methods : "[32] Laura Chihara and Tim Hesterberg. Mathematical Statistics with Resampling 
and R. Wiley, 1st edition, 2011. ISBN 978-1-1180-2985-5". Looking at the author's profiles, I would say 
that "Beware of the trap of listening to people with no knowledge of basic numerical methods!" does 
not apply to them. Here is their recommended R code for one-way anova (chapter 12, but adapted to my example 
data):

You should tell the authors that their code doesn't work. Many bookscontain errors, but they only get corrected when authors are informedabout them.


> observed <- anova(lm(Y ~ F, dat1))$F[1]
> n <- length(dat1$Y)
> B <- 10^5 - 1
> results <- numeric(B)
> for (i in 1:B)
+ {
+       index <- sample(n)
+       Y.perm <- dat1$Y[index]
+       results[i] <- anova(lm(Y.perm ~ dat1$F))$F[1]
+ }
> (sum(results> observed) + 1) / (B + 1)  # P value
[1] 0.03969

Well, it seems that I am not the only guy who does not find the trap obvious ...

Thus, my final question remains : How can we evaluate the reliability of CRAN 
packages that propose randomization (or bootstrap) methods ?

The same way as you evaluate the reliability of any software: youexamine the code for common errors, you apply tests to problems that youknow to be difficult, etc. At least all CRAN packages provide you withsource code.


Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rép : ANOVA and permutation tests : beware of traps

Reply via email to