Alpha-beta rollouts is like MCTS without playouts (as in AlphaZero), and something that can also do alpha-beta pruning.
With standard MCTS, the tree converges to a minmax tree not an alpha-beta tree, so as you know there is a huge branching factor difference there. For MCTS to become competitive with alpha-beta, one has to use either alpha-beta rollouts OR as in AlphaZero's case pick accurately, say the best 3 moves with its PUCT bias, and then search those with MCTS that would converge to minimax (not alpha-beta). I went for the latter (recasting alphabeta in rollouts form which can be combined with standard MCTS as well as standard recursive alpha-beta implementation) The paper is here https://www.microsoft.com/en-us/research/wp-content/uploads/2014/11/huang_rollout.pdf If it was upto me, I would have used standard alpha-beta search with policy/value networks to improve move sorting and evaluation resp. It seems AlphaZero's MCTS approach hinged on the policy network accurately picking the best n moves (while still using minmax) AND then ofcourse not making blunders. I think what "shallow traps" are some bad looking moves turning out to be brilliant tactically later. How the policy nework able to identify those moves ? A full width like alpha-beta does not miss those. Daniel On Tue, Mar 6, 2018 at 1:23 PM, Álvaro Begué <alvaro.be...@gmail.com> wrote: > Sorry, I haven't been paying enough attention lately to know what > "alpha-beta rollouts" means precisely. Can you either describe them or give > me a reference? > > Thanks, > Álvaro. > > > > On Tue, Mar 6, 2018 at 1:49 PM, Dan <dsha...@gmail.com> wrote: > >> I did a quick test with my MCTS chess engine wth two different >> implementations. >> A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The >> result is like a 600 elo difference >> >> Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold >> repetition} >> Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2 [0.955] 44 >> Elo difference: 528.89 +/- nan >> >> scorpio-mcts uses alpha-beta rollouts >> scorpio-pmcts is "pure" mcts with averaging and UCB formula. >> >> Daniel >> >> On Tue, Mar 6, 2018 at 11:46 AM, Dan <dsha...@gmail.com> wrote: >> >>> I am pretty sure it is an MCTS problem and I suspect not something that >>> could be easily solved with a policy network (could be wrong hree). My >>> opinon is that DCNN is not >>> a miracle worker (as somebody already mentioned here) and it is going to >>> fail resolving tactics. I would be more than happy with it if it has same >>> power as a qsearch to be honest. >>> >>> Search traps are the major problem with games like Chess, and what makes >>> transitioning the success of DCNN from Go to Chess non trivial. >>> The following paper discusses shallow traps that are prevalent in chess. >>> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl >>> oad/1458/1571 ) >>> They mention traps make MCTS very inefficient. Even if the MCTS is >>> given 50x more time is needed by an exhaustive minimax tree, it could fail >>> to find a level-5 or level-7 trap. >>> It will spend, f.i, 95% of its time searching an asymetric tree of depth >>> > 7 when a shallow trap of depth-7 exists, thus, missing to find the >>> level-7 trap. >>> This is very hard to solve even if you have unlimited power. >>> >>> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version >>> in my opinion and i have hard a hard time seeing how it can be competitive >>> with Stockfish tactically. >>> >>> My MCTS chess engine with AlphaZero like MCTS was averaging was missing >>> a lot of tactics. I don't use policy or eval networks but qsearch() for >>> eval, and the policy is basically >>> choosing which ever moves leads to a higher eval. >>> >>> a) My first improvement to the MCTS is to use minimax backups instead of >>> averaging. This was an improvmenet but not something that would solve the >>> traps >>> >>> b) My second improvment is to use alphabeta rollouts. This is a rollouts >>> version that can do nullmove and LMR etc... This is a huge improvment and >>> none of the MCTS >>> versons can match it. More on alpha-beta rollouts here ( >>> https://www.microsoft.com/en-us/research/wp-content/upload >>> s/2014/11/huang_rollout.pdf ) >>> >>> So AlphaZero used none of the above improvements and yet it seems to be >>> tactically strong. Leela-Zero suffered from tactical falls left and right >>> too as I expected. >>> >>> So the only explanation left is the policy network able to avoid traps >>> which I find hard to believe it can identify more than a qsearch level >>> tactics. >>> >>> All I am saying is that my experience (as well as many others) with MCTS >>> for tactical dominated games is bad, and there must be some breakthrough in >>> that regard in AlphaZero >>> for it to be able to compete with Stockfish on a tactical level. >>> >>> I am curious how Remi's attempt at Shogi using AlphaZero's method will >>> turnout. >>> >>> regards, >>> Daniel >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go < >>> computer-go@computer-go.org> wrote: >>> >>>> Training on Stockfish games is guaranteed to produce a blunder-fest, >>>> because there are no blunders in the training set and therefore the policy >>>> network never learns how to refute blunders. >>>> >>>> >>>> >>>> This is not a flaw in MCTS, but rather in the policy network. MCTS will >>>> eventually search every move infinitely often, producing asymptotically >>>> optimal play. But if the policy network does not provide the guidance >>>> necessary to rapidly refute the blunders that occur in the search, then >>>> convergence of MCTS to optimal play will be very slow. >>>> >>>> >>>> >>>> It is necessary for the network to train on self-play games using MCTS. >>>> For instance, the AGZ approach samples next states during training games by >>>> sampling from the distribution of visits in the search. Specifically: not >>>> by choosing the most-visited play! >>>> >>>> >>>> >>>> You see how this policy trains both search and evaluation to be >>>> internally consistent? The policy head is trained to refute the bad moves >>>> that will come up in search, and the value head is trained to the value >>>> observed by the full tree. >>>> >>>> >>>> >>>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On >>>> Behalf Of *Dan >>>> *Sent:* Monday, March 5, 2018 4:55 AM >>>> *To:* computer-go@computer-go.org >>>> *Subject:* Re: [Computer-go] 9x9 is last frontier? >>>> >>>> >>>> >>>> Actually prior to this it was trained with hundreds of thousands of >>>> stockfish games and didn’t do well on tactics (the games were actually a >>>> blunder fest). I believe this is a problem of the MCTS used and not due to >>>> for lack of training. >>>> >>>> >>>> >>>> Go is a strategic game so that is different from chess that is full of >>>> traps. >>>> >>>> I m not surprised Lela zero did well in go. >>>> >>>> >>>> >>>> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org> >>>> wrote: >>>> >>>> On 02-03-18 17:07, Dan wrote: >>>> > Leela-chess is not performing well enough >>>> >>>> I don't understand how one can say that given that they started with the >>>> random network last week only and a few clients. Of course it's bad! >>>> That doesn't say anything about the approach. >>>> >>>> Leela Zero has gotten strong but it has been learning for *months* with >>>> ~400 people. It also took a while to get to 30 kyu. >>>> >>>> -- >>>> GCP >>>> _______________________________________________ >>>> Computer-go mailing list >>>> Computer-go@computer-go.org >>>> http://computer-go.org/mailman/listinfo/computer-go >>>> >>>> >>>> _______________________________________________ >>>> Computer-go mailing list >>>> Computer-go@computer-go.org >>>> http://computer-go.org/mailman/listinfo/computer-go >>>> >>> >>> >> >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go