State of the art in computer chess is alpha-beta search, but note that the search is very selective because of "late move reductions."
A late move reduction is to reduce depth for moves after the first move generated in a node. For example, a simple implementation would be "search the first move generated to depth N, and all subsequent moves to depth N-1". That simple idea is a big improvement. LMR works (in large part) because move generators produce probable-best-first order. Therefore, if the first few moves do not work (i.e., refutes opponent's late move) then there probably isn't a refutation. By searching unlikely moves less, you have the time to search likely variations more. Stockfish's actual implementation massively increases effective search depth over the simple policy outlined above. Because of LMR, the effective branching factor of chess search trees is less than 2. IMO, late move reductions should work really well in Go, given that the move predictors are above 60% accuracy. But I have not tried. Using MCTS for training the evaluator is probably necessary, even if the ultimate goal is to use the evaluator in alpha-beta. Best, Brian -----Original Message----- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Petr Baudis Sent: Thursday, November 16, 2017 10:43 AM To: computer-go@computer-go.org Subject: [Computer-go] Is MCTS needed? Hi, when explaining AlphaGo Zero to a machine learning audience yesterday (https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view) it occurred to me that using MCTS in this setup is actually such a kludge! Originally, we used MCTS because with the repeated simulations, we would be improving the accuracy of the arm reward estimates. MCTS policies assume stationary distributions, which is violated every time we expand the tree, but it's an okay tradeoff if all you feed into the tree are rewards in the form of just Bernoulli trials. Moreover, you could argue evaluations are somewhat monotonic with increasing node depths as you are basically just fixing a growing prefix of the MC simulation. But now, we expand the nodes literally all the time, breaking the stationarity possibly in drastic ways. There are no reevaluations that would improve your estimate. The input isn't binary but an estimate in a continuous space. Suddenly the Multi-armed Bandit analogy loses a lot of ground. Therefore, can't we take the next step, and do away with MCTS? Is there a theoretical viewpoint from which it still makes sense as the best policy improvement operator? What would you say is the current state-of-art game tree search for chess? That's a very unfamiliar world for me, to be honest all I really know is MCTS... -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go