On Tue, 5 Oct 2010, "Ingo Althöfer" wrote:

Hello Don,



There were a couple of experiments  that
were far from scientific which involved manually changing parameters and
pathetically small samples.

Doubly wrong. I am one of the persons you refer to.
I ran "my" games manually, but according to a fixed rule ("rule 42")
without changing parameters.

You are a strong programmer, but in statistics you seem to have deficits:
statistics also can draw conclusions from small samples when the results
are clear.

In my experiments I had three programs:
A (strong, with dynamic komi)
B (strong, without dynamic komi)
C (weak).
The scores were
A vs C  3-1
B vs C  0-4
With respect to the 5 % level this refutes the hypothesis that
A is not stronger than B.

To prove this you have to find upper bounds for
(1-p)^3 * [1-p + 4p] * p^4,  where p is in the interval [0,1].
This maximum is clearly below 0.05.

I do not completely agree with this calculation. What you have done could be seen as multiplying p-values. Forbidden. Specifically, you have computed the probability under p of having both A-C 3-1 or better for A
B-C 0-4 or worse for B

The problem is that you are comparing A to B. So you have to take all
the cases where the apparent difference in strength is at least as big.
This notably includes the case:
A-C 4-0
B-C 1-3

In fact in this case that's the only part to be added.
We still get a p-value less than 0.04, it's still significant.

To convince you you have made a mistake, imagine you had made MANY
simulations N.
You get the results
A-C (N/2 + 1)  - (N/2 -1)
B-C N/2 - N/2

It should be obvious the p-value is almost .5. Your methodology yields
almost .25.

Jonas
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to