On Fri, Jun 24, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote:
>
> On Fri, Jun 24, 2011 at 1:40 AM, Darren Cook <[email protected]> wrote:
>>
>> On 2011-06-21 02:53, Álvaro Begué wrote:
>> > I think when you have explored a node enough times, there is no point
>> > in considering the score of the node to be the average of all the
>> > tries, but it should really be just the score of the best child (i.e.,
>> > the minimax rule). UCT does converge to that value, but it does so by
>> > reducing exploration of inferior moves, which results in the
>> > long-lines behavior I just described.
>> >
>> > Perhaps there is a role for alpha-beta near the root of the tree when
>> > we have enough CPU, and that might scale better.
>
> Referring to what Álvaro Begué wrote,  I think the idea is sound,  MCTS
> really
> is mini-max and considering the score of the other nodes is just a noise
> reduction technique which is not strictly necessary.    When there are few
> samples it's surely a benefit but many not when there are many.   Perhaps
> the influence of sibling scores should be gradually removed?    I guess in
> practice that is what happens when one move is so popular others rarely
> get played.

You see the problem with that last sentence, don't you? I don't know
much about go, so perhaps I should use a queen sacrifice in chess as
an example... No, I'll try with go anyway. Imagine a situation where
my opponent just attacked a large group of mine. It turns out the move
doesn't need a response (gote), but it is not at all obvious that it
doesn't need a response. A good tenuki will not be explored much by
MCTS, because initially it looks like its score is much worse than
that of a more defensive response which clearly saves the group, and
therefore it would take MCTS an impractical amount of time to change
its mind. At the root we could just decide to be more exploratory and
give apparently weak moves more of a chance. But if we do this at
nodes that are not the root, all the exploration will pollute the
score that we return.

In minimax, we give every move a similar chance*, but then we don't
spend much time searching obviously bad moves because alpha-beta does
a great job of proving they are inferior to the best move we've found.
If a move returns a very bad score (queen sacrifice), we may still
spend a lot of time on it, if proving that it's a bad move is hard.
And the important feature missing in MCTS is that the score we return
doesn't suffer because we spent time on that move that turned out to
be bad.

I believe this is the main reason programs are not good at reading
semeais. Any situation in which a move's score might jump enormously
when we discover the correct follow-up moves will probably give MCTS
trouble. Of course fixing this will do nothing to help in situations
where the playouts misread the status of some group. That's more like
having a bad evaluation function with mistakes that persist for many
moves.

(*) - To be completely correct, bad moves should get penalized in
depth something proportional to -log(probability of being the right
move), and some techniques like LMR (late move reductions) approximate
that, but as the search depth increases they will get explored deeper
and deeper anyway, and eventually (in a practical amount of time)
minimax will see their merit.

>
>>
>> One of my favourite subjects. I noticed, about 3-4 years ago now, from
>> my sm9 (human-computer team 9x9 go) experiments that an MCTS program
>> would usually have chosen a strong move after say 50,000 playouts, but
>> when it was wrong it typically would still be wrong after a million
>> playouts. (Very subjective, sorry Don ;-)
>
> I have no problem with empirical and subject observations when it's used to
> for
> hypothesis building.   But once you view something as a fact then you need
> to
> be correct  because now new ideas are built upon it and then you could have
> a
> mess!     For example  once you accept as fact that the earth is the center
> of
> the universe you cannot make further progress in understanding the universe.
>
>
>>
>> Hence the proposal to use alpha-beta as the top-level search, using MCTS
>> with about 50K playouts at the nodes. I've done a few experiments in
>> this direction, and I still think it is very promising. Technically the
>> current state of sm9 automation is minimax on top of 4 MCTS and one
>> traditional go program. (But very few nodes in the minimax tree as I
>> give each program a few minutes of CPU time for every move.)
>>
>> Darren
>>
>>
>> --
>> Darren Cook, Software Researcher/Developer
>>
>> http://dcook.org/work/ (About me and my work)
>> http://dcook.org/blogs.html (My blogs and articles)
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to