Heya,
I lately tried to think about, whether it would be possible to combine the
strengths of different bots, or at least different parameter sets/bias
systems for one bots in some way. They may shine at different
situations/phases during the game, but how to figure out, which one is
currently the better one?

What I now came up with, was the following:
For simplicity we assume for now, that our different bots are using the
same playouts, but different approaches during the tree phase. So maybe
they use different ways to bias nodes, different selection formulas etc.
Gonna focus on them using different bias systems.
Now you split up your playouts in percentiles:
25%: Bot1 selects the white moves, Bot2 the black ones
25%: the other way around
0<x%<50%: Bot1 selects for both.
50%-x%: Bot2 selects for both.
You track the win rates of those first 2 quarters separately and also
calculate winrate of Bot1 vs Bot2.
Now if the bots are identical obviously both should win 50%. But if the
bots are different, you may see different results.
E.g. when Bot1 wins 55% of his games, his move selection is probably better
then Bot2's move selection. Here you have to be careful about wrong
conclusions, because if you would setup a depth-first bot vs a width-first
you would certainly also get win rates heavily in favor of the depth-first.
But where this could shine is, when using different bias systems. Because
it actually tells you, which bias system is doing better in the current
board situation.
Now you can use that knowledge to calculate x. E.g. if either bot wins 60%+
he gains all 50% of the remaining playouts, and let the balance slide
linearly, if the win rate is 40-60% for the bots against each other. (use
whatever formula here, open for testing)

This should enable you to figure out on the fly, which bias system is doing
a better job at the current situation, while doing playouts. You are just
tracking some additional stats.

Of course, there are pros and cons to this method:
+ In general, switching selection method in the tree should not cost any
time, and tracking those additional stats also costs close to no time. Only
additional time used comes from using a second bias function or similar
(because now you have to calculate the bias twice, for most nodes)
- At the same time, the amount of data is actually increasing, because you
have to track the stats for the different bots. This may cause memory
issues! (but when using a distributed memory solution anyway, it does not
create additional data, if each memory/thread unit is assigned to one of
those percentiles of playouts)
+- In general the costs increase depends on how different the 2 Bots are.
If they are the same, there would be basically no cost.
+ Allows you to figure out, which bot/bias/selection is doing better right
now
- but may lead to false conclusions, like above mentioned depth vs width
example
+- As long as both bots are of similar strength, you should not lose from
using this kind of system. Worst case is, that you play the wrong bots
move, if you had above mentioned false conclusions. But when they are close
in strength, that is not worse than using just one bot all the time. Of
course, if one bot is dominating the other one in all situations, you are
losing quality, when figuring out again and again, that this is the case.
(because in 50% of the playouts, half of the moves were selected by the
worse bot)



So some quick ideas how it could be modified further:
- all percentages are obviously placeholders and could be adjusted (even
dynamically)
- assuming you have a low cost and a very high time cost bias function, you
can actually check, if it is worth using the high cost bias function, or if
the board situation is simple enough to churn out more playouts using the
cheaper function.
- identifying certain game situations: you may know, that one bias is doing
much better in corner fights or liberty races, so if that one is
dominating, you are probably in such a situation and can now adjust further
(e.g. modify playouts, add additional routines, whatever)
- using more then 2 routines, possibly in conjunction with the idea above
to identify situations.

Of course, one could now also think how to expand this idea, with using
different playouts, but I'm not sure how to judge, which playouts are the
better ones, when using different ones. Only because Bot1's playouts tell
me, I'm winning 60%, it does not mean, that his playouts are better then
Bot2's, who is only giving me 40%. (Even though I would wish so :D) So
right now I need the same playouts, to judge my selection routine. Maybe
someone else has an idea, how to judge playouts?


What do you guys think? Is there anything with potential in those ideas? Or
is the cost and danger of wrong conclusions too high for the possible gains?
Sadly can't test it with my own bot,as my versions strictly dominate each
other, and with its current strength it would probably be not conclusive
anyway.


Marc
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to