Re: [Computer-go] 9x9 is last frontier?

Dan Tue, 06 Mar 2018 16:32:50 -0800

Alpha-beta rollouts is like MCTS without playouts (as in AlphaZero), and
something that can also do alpha-beta pruning.


With standard MCTS, the tree converges to a minmax tree not an alpha-beta
tree, so as you know there is a huge branching factor difference there.

For MCTS to become competitive with alpha-beta, one has to use either
alpha-beta rollouts OR as in AlphaZero's case
pick accurately, say the best 3 moves with its PUCT bias, and then search
those with MCTS that would converge to minimax (not alpha-beta).

I went for the latter (recasting alphabeta in rollouts form which can be
combined with standard MCTS as well as standard recursive alpha-beta
implementation)

The paper is here
https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf

If it was upto me, I would have used standard alpha-beta search with
policy/value networks to improve move sorting and evaluation resp.

It seems AlphaZero's MCTS approach hinged on the policy network accurately
picking the best n moves (while still using minmax) AND then
ofcourse not making blunders. I think what "shallow traps" are some bad
looking moves turning out to be brilliant tactically later. How the policy
nework able to identify those moves ? A full width like alpha-beta does not
miss those.

Daniel



On Tue, Mar 6, 2018 at 1:23 PM, Álvaro Begué <alvaro.be...@gmail.com> wrote:

> Sorry, I haven't been paying enough attention lately to know what
> "alpha-beta rollouts" means precisely. Can you either describe them or give
> me a reference?
>
> Thanks,
> Álvaro.
>
>
>
> On Tue, Mar 6, 2018 at 1:49 PM, Dan <dsha...@gmail.com> wrote:
>
>> I did a quick test with my MCTS chess engine wth two different
>> implementations.
>> A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The
>> result is like a 600 elo difference
>>
>> Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold
>> repetition}
>> Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2  [0.955] 44
>> Elo difference: 528.89 +/- nan
>>
>> scorpio-mcts uses alpha-beta rollouts
>> scorpio-pmcts is "pure" mcts with averaging and UCB formula.
>>
>> Daniel
>>
>> On Tue, Mar 6, 2018 at 11:46 AM, Dan <dsha...@gmail.com> wrote:
>>
>>> I am pretty sure it is an MCTS problem and I suspect not something that
>>> could be easily solved with a policy network (could be wrong hree). My
>>> opinon is that DCNN is not
>>> a miracle worker (as somebody already mentioned here) and it is going to
>>> fail  resolving tactics.  I would be more than happy with it if it has same
>>> power as a qsearch to be honest.
>>>
>>> Search traps are the major problem with games like Chess, and what makes
>>> transitioning the success of DCNN from Go to Chess non trivial.
>>> The following paper discusses shallow traps that are prevalent in chess.
>>> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl
>>> oad/1458/1571 )
>>> They mention traps make MCTS very inefficient.  Even if the MCTS is
>>> given 50x more time is needed by an exhaustive minimax tree, it could fail
>>> to find a level-5 or level-7 trap.
>>> It will spend, f.i, 95% of its time searching an asymetric tree of depth
>>> > 7 when a shallow trap of depth-7 exists, thus, missing to find the
>>> level-7 trap.
>>> This is very hard to solve even if you have unlimited power.
>>>
>>> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version
>>> in my opinion and i have hard a hard time seeing how it can be competitive
>>> with Stockfish tactically.
>>>
>>> My MCTS chess engine with  AlphaZero like MCTS was averaging was missing
>>> a lot of tactics. I don't use policy or eval networks but qsearch() for
>>> eval, and the policy is basically
>>> choosing which ever moves leads to a higher eval.
>>>
>>> a) My first improvement to the MCTS is to use minimax backups instead of
>>> averaging. This was an improvmenet but not something that would solve the
>>> traps
>>>
>>> b) My second improvment is to use alphabeta rollouts. This is a rollouts
>>> version that can do nullmove and LMR etc... This is a huge improvment and
>>> none of the MCTS
>>> versons can match it. More on alpha-beta rollouts here (
>>> https://www.microsoft.com/en-us/research/wp-content/upload
>>> s/2014/11/huang_rollout.pdf )
>>>
>>> So AlphaZero used none of the above improvements and yet it seems to be
>>> tactically strong. Leela-Zero suffered from tactical falls left and right
>>> too as I expected.
>>>
>>> So the only explanation left is the policy network able to avoid traps
>>> which I find hard to believe it can identify more than a qsearch level
>>> tactics.
>>>
>>> All I am saying is that my experience (as well as many others) with MCTS
>>> for tactical dominated games is bad, and there must be some breakthrough in
>>> that regard in AlphaZero
>>> for it to be able to compete with Stockfish on a tactical level.
>>>
>>> I am curious how Remi's attempt at Shogi using AlphaZero's method will
>>> turnout.
>>>
>>> regards,
>>> Daniel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <
>>> computer-go@computer-go.org> wrote:
>>>
>>>> Training on Stockfish games is guaranteed to produce a blunder-fest,
>>>> because there are no blunders in the training set and therefore the policy
>>>> network never learns how to refute blunders.
>>>>
>>>>
>>>>
>>>> This is not a flaw in MCTS, but rather in the policy network. MCTS will
>>>> eventually search every move infinitely often, producing asymptotically
>>>> optimal play. But if the policy network does not provide the guidance
>>>> necessary to rapidly refute the blunders that occur in the search, then
>>>> convergence of MCTS to optimal play will be very slow.
>>>>
>>>>
>>>>
>>>> It is necessary for the network to train on self-play games using MCTS.
>>>> For instance, the AGZ approach samples next states during training games by
>>>> sampling from the distribution of visits in the search. Specifically: not
>>>> by choosing the most-visited play!
>>>>
>>>>
>>>>
>>>> You see how this policy trains both search and evaluation to be
>>>> internally consistent? The policy head is trained to refute the bad moves
>>>> that will come up in search, and the value head is trained to the value
>>>> observed by the full tree.
>>>>
>>>>
>>>>
>>>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
>>>> Behalf Of *Dan
>>>> *Sent:* Monday, March 5, 2018 4:55 AM
>>>> *To:* computer-go@computer-go.org
>>>> *Subject:* Re: [Computer-go] 9x9 is last frontier?
>>>>
>>>>
>>>>
>>>> Actually prior to this it was trained with hundreds of thousands of
>>>> stockfish games and didn’t do well on tactics (the games were actually a
>>>> blunder fest). I believe this is a problem of the MCTS used and not due to
>>>> for lack of training.
>>>>
>>>>
>>>>
>>>> Go is a strategic game so that is different from chess that is full of
>>>> traps.
>>>>
>>>> I m not surprised Lela zero did well in go.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org>
>>>> wrote:
>>>>
>>>> On 02-03-18 17:07, Dan wrote:
>>>> > Leela-chess is not performing well enough
>>>>
>>>> I don't understand how one can say that given that they started with the
>>>> random network last week only and a few clients. Of course it's bad!
>>>> That doesn't say anything about the approach.
>>>>
>>>> Leela Zero has gotten strong but it has been learning for *months* with
>>>> ~400 people. It also took a while to get to 30 kyu.
>>>>
>>>> --
>>>> GCP
>>>> _______________________________________________
>>>> Computer-go mailing list
>>>> Computer-go@computer-go.org
>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>
>>>>
>>>> _______________________________________________
>>>> Computer-go mailing list
>>>> Computer-go@computer-go.org
>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 9x9 is last frontier?

Reply via email to