On Tue, 5 Oct 2010, "Ingo Althöfer" wrote:
Hello Don,
There were a couple of experiments that
were far from scientific which involved manually changing parameters and
pathetically small samples.
Doubly wrong. I am one of the persons you refer to.
I ran "my" games manually, but according to a fixed rule ("rule 42")
without changing parameters.
You are a strong programmer, but in statistics you seem to have deficits:
statistics also can draw conclusions from small samples when the results
are clear.
In my experiments I had three programs:
A (strong, with dynamic komi)
B (strong, without dynamic komi)
C (weak).
The scores were
A vs C 3-1
B vs C 0-4
With respect to the 5 % level this refutes the hypothesis that
A is not stronger than B.
To prove this you have to find upper bounds for
(1-p)^3 * [1-p + 4p] * p^4, where p is in the interval [0,1].
This maximum is clearly below 0.05.
I do not completely agree with this calculation.
What you have done could be seen as multiplying p-values. Forbidden.
Specifically, you have computed the probability under p of having both
A-C 3-1 or better for A
B-C 0-4 or worse for B
The problem is that you are comparing A to B. So you have to take all
the cases where the apparent difference in strength is at least as big.
This notably includes the case:
A-C 4-0
B-C 1-3
In fact in this case that's the only part to be added.
We still get a p-value less than 0.04, it's still significant.
To convince you you have made a mistake, imagine you had made MANY
simulations N.
You get the results
A-C (N/2 + 1) - (N/2 -1)
B-C N/2 - N/2
It should be obvious the p-value is almost .5. Your methodology yields
almost .25.
Jonas
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go