I have not used time exactly to match up Valkyria and Fuego. But also
with fixed numbers of simulations one can match them up closely in
processor time. And then I do that Valkyria wins something like 41-45%
of the time.
I never intentionally designed Valkyria to work well small number of
simulation as in these tests, but in principle you have to do that no
matter how many simulations you run per move, because you will always
have few simulations in the leaves of the tree. And if the leaves are
evaluated strongly then the nodes nearer to the root also will benefit.
Magnus
Quoting Christian Nentwich <[email protected]>:
Magnus,
along the lines of the argument I am trying to make: did you try your
experiments with time limits from 30 seconds per game to five minutes
per game (say), rather than playouts? Using gogui-twogtp this is
relatively easy to achieve.
I am obviously not associated with Fuego, but I guess it is reasonable
to assume that Fuego's architecture was not designed to operate at
limits like 2, 8 or 32 simulations in the same way Valkyria was. It is
an interesting study in its own right for scalability purposes; but to
go on to infer strength from it would seem like comparing apples and
oranges.
Christian
Magnus Persson wrote:
Quoting Darren Cook <[email protected]>:
* The scaling behavior might be different. E.g. if Fuego and Valkyria
are both run with 10 times more playouts the win rate might change. Just
to dismiss an algorithm that loses at time limits that happen to suit
rapid testing on today's hardware could mean we miss out on the ideal
algorithm for tomorrow's hardware. (*)
I just happened to have experimental data on exactly this topic.
This is Valkyria vs Fuego where I scale the number of playouts
(Sims) x4 in each row.
Sims Winrate Err N WR EloDiff
2 99.2 0.4 500 0.992 837
8 98.2 0.6 500 0.982 696
32 94.2 1 500 0.942 484
128 88.8 1.4 500 0.888 360
512 82 1.7 500 0.82 263
2048 83.2 1.7 499 0.832 278
8192 81.3 1.7 497 0.813 255
32768 75.5 3.6 139 0.755 196
The data shows clearly that the 0.3.2 version of Fuego I use,
probably plays really bad moves with a high frequency in the
playouts. With more playouts a lot of these blunders can be avoided
I guess and the win rate goes down from 99% towards 80%. The
question here if it goes asymptotically towards 80% or perhaps 50%
with more simulations. Unfortunately I cannot continue this plot
because I run out of memory and it takes ages to finish the games.
So the question is then: are there a fixed gain of the heavy
playouts with more than 512 simulations or does the effect of the
heavy playout get less and less important with larger tree size.
Note also that this also not only a matter of having heavy playouts
or not. There is also a difference in tree search since Valkyria
and Fuego probably search their trees differently, and it could be
that Valkyria search deep trees
inefficiently.
Maybe I should run a similar test against a light version of
Valkyria to control for the search algorithm.
-Magnus
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
--
Christian Nentwich
Director, Model Two Zero Ltd.
+44-(0)7747-061302
http://www.modeltwozero.com
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
--
Magnus Persson
Berlin, Germany
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/