Sorry, I haven't been paying enough attention lately to know what "alpha-beta rollouts" means precisely. Can you either describe them or give me a reference?
Thanks, Álvaro. On Tue, Mar 6, 2018 at 1:49 PM, Dan <dsha...@gmail.com> wrote: > I did a quick test with my MCTS chess engine wth two different > implementations. > A standard MCTS with averaging, and MCTS with alpha-beta rollouts. The > result is like a 600 elo difference > > Finished game 44 (scorpio-pmcts vs scorpio-mcts): 1/2-1/2 {Draw by 3-fold > repetition} > Score of scorpio-mcts vs scorpio-pmcts: 41 - 1 - 2 [0.955] 44 > Elo difference: 528.89 +/- nan > > scorpio-mcts uses alpha-beta rollouts > scorpio-pmcts is "pure" mcts with averaging and UCB formula. > > Daniel > > On Tue, Mar 6, 2018 at 11:46 AM, Dan <dsha...@gmail.com> wrote: > >> I am pretty sure it is an MCTS problem and I suspect not something that >> could be easily solved with a policy network (could be wrong hree). My >> opinon is that DCNN is not >> a miracle worker (as somebody already mentioned here) and it is going to >> fail resolving tactics. I would be more than happy with it if it has same >> power as a qsearch to be honest. >> >> Search traps are the major problem with games like Chess, and what makes >> transitioning the success of DCNN from Go to Chess non trivial. >> The following paper discusses shallow traps that are prevalent in chess. >> ( https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/downl >> oad/1458/1571 ) >> They mention traps make MCTS very inefficient. Even if the MCTS is given >> 50x more time is needed by an exhaustive minimax tree, it could fail to >> find a level-5 or level-7 trap. >> It will spend, f.i, 95% of its time searching an asymetric tree of depth >> > 7 when a shallow trap of depth-7 exists, thus, missing to find the >> level-7 trap. >> This is very hard to solve even if you have unlimited power. >> >> The plain MCTS as used by AlphaZero is the most ill-suited MCTS version >> in my opinion and i have hard a hard time seeing how it can be competitive >> with Stockfish tactically. >> >> My MCTS chess engine with AlphaZero like MCTS was averaging was missing >> a lot of tactics. I don't use policy or eval networks but qsearch() for >> eval, and the policy is basically >> choosing which ever moves leads to a higher eval. >> >> a) My first improvement to the MCTS is to use minimax backups instead of >> averaging. This was an improvmenet but not something that would solve the >> traps >> >> b) My second improvment is to use alphabeta rollouts. This is a rollouts >> version that can do nullmove and LMR etc... This is a huge improvment and >> none of the MCTS >> versons can match it. More on alpha-beta rollouts here ( >> https://www.microsoft.com/en-us/research/wp-content/upload >> s/2014/11/huang_rollout.pdf ) >> >> So AlphaZero used none of the above improvements and yet it seems to be >> tactically strong. Leela-Zero suffered from tactical falls left and right >> too as I expected. >> >> So the only explanation left is the policy network able to avoid traps >> which I find hard to believe it can identify more than a qsearch level >> tactics. >> >> All I am saying is that my experience (as well as many others) with MCTS >> for tactical dominated games is bad, and there must be some breakthrough in >> that regard in AlphaZero >> for it to be able to compete with Stockfish on a tactical level. >> >> I am curious how Remi's attempt at Shogi using AlphaZero's method will >> turnout. >> >> regards, >> Daniel >> >> >> >> >> >> >> >> >> On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go < >> computer-go@computer-go.org> wrote: >> >>> Training on Stockfish games is guaranteed to produce a blunder-fest, >>> because there are no blunders in the training set and therefore the policy >>> network never learns how to refute blunders. >>> >>> >>> >>> This is not a flaw in MCTS, but rather in the policy network. MCTS will >>> eventually search every move infinitely often, producing asymptotically >>> optimal play. But if the policy network does not provide the guidance >>> necessary to rapidly refute the blunders that occur in the search, then >>> convergence of MCTS to optimal play will be very slow. >>> >>> >>> >>> It is necessary for the network to train on self-play games using MCTS. >>> For instance, the AGZ approach samples next states during training games by >>> sampling from the distribution of visits in the search. Specifically: not >>> by choosing the most-visited play! >>> >>> >>> >>> You see how this policy trains both search and evaluation to be >>> internally consistent? The policy head is trained to refute the bad moves >>> that will come up in search, and the value head is trained to the value >>> observed by the full tree. >>> >>> >>> >>> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On >>> Behalf Of *Dan >>> *Sent:* Monday, March 5, 2018 4:55 AM >>> *To:* computer-go@computer-go.org >>> *Subject:* Re: [Computer-go] 9x9 is last frontier? >>> >>> >>> >>> Actually prior to this it was trained with hundreds of thousands of >>> stockfish games and didn’t do well on tactics (the games were actually a >>> blunder fest). I believe this is a problem of the MCTS used and not due to >>> for lack of training. >>> >>> >>> >>> Go is a strategic game so that is different from chess that is full of >>> traps. >>> >>> I m not surprised Lela zero did well in go. >>> >>> >>> >>> On Mon, Mar 5, 2018 at 2:16 AM Gian-Carlo Pascutto <g...@sjeng.org> >>> wrote: >>> >>> On 02-03-18 17:07, Dan wrote: >>> > Leela-chess is not performing well enough >>> >>> I don't understand how one can say that given that they started with the >>> random network last week only and a few clients. Of course it's bad! >>> That doesn't say anything about the approach. >>> >>> Leela Zero has gotten strong but it has been learning for *months* with >>> ~400 people. It also took a while to get to 30 kyu. >>> >>> -- >>> GCP >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >>> >>> _______________________________________________ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go