[R] methodology question : is anova appropriate for these data?

Hamilton-Green, Matthew J Wed, 06 Oct 2010 07:48:29 -0700

Representative small sample of data:

algorithmID <- factor(c(rep('alg1',4),rep('alg2',4),rep('alg3',4)))
threshold <- factor(rep(c(.45,.50,.55,.60),times=3))
score <- c(30,32,31,30,10,12,13,14,22,21,20,24)
d <- data.frame(algorithmID,threshold,score)


AlgorithmID is the name of each algorithm; threshold is the value of a 
parameter used by the algorithm that produces the score; the score is a number 
that can take any integer value between 0 and 40.

I'd like to know whether different algorithms reliably produce different 
scores. A score comes from the algorithm being run with the specified value of 
'threshold'. The value of threshold is fixed for a given run of each algorithm 
- in that sense I think that (but I'm not sure that) it should be treated as a 
fixed factor rather than a random factor.

I am tempted to try:

d.aov <- aov(score ~ algorithmID + Error(threshold/algorithmID))

but I am doubtful whether it is appropriate to treat 'threshold' in this way.

I have two queries:

1. How should I determine whether ANOVA is an appropriate test of the null 
hypothesis that score does not vary significantly by algorithmID?

2. If values for threshold were randomly sampled from the range 0.01 to 0.99, 
rather than being fixed, which is an option, would that make any difference to 
whether ANOVA would be suitable?

Any advice gratefully received,
Matt
Research Assistant, University of Aberdeen


The University of Aberdeen is a charity registered in Scotland, No SC013683.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] methodology question : is anova appropriate for these data?

Reply via email to