State of the art in computer chess is alpha-beta search, but note that the 
search is very selective because of "late move reductions."

A late move reduction is to reduce depth for moves after the first move 
generated in a node. For example, a simple implementation would be "search the 
first move generated to depth N, and all subsequent moves to depth N-1". That 
simple idea is a big improvement.

LMR works (in large part) because move generators produce probable-best-first 
order. Therefore, if the first few moves do not work (i.e., refutes opponent's 
late move) then there probably isn't a refutation. By searching unlikely moves 
less, you have the time to search likely variations more.

Stockfish's actual implementation massively increases effective search depth 
over the simple policy outlined above. Because of LMR, the effective branching 
factor of chess search trees is less than 2.

IMO, late move reductions should work really well in Go, given that the move 
predictors are above 60% accuracy. But I have not tried. 

Using MCTS for training the evaluator is probably necessary, even if the 
ultimate goal is to use the evaluator in alpha-beta.

Best,
Brian

-----Original Message-----
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Petr Baudis
Sent: Thursday, November 16, 2017 10:43 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Is MCTS needed?

  Hi,

  when explaining AlphaGo Zero to a machine learning audience yesterday

        
(https://docs.google.com/presentation/d/1VIueYgFciGr9pxiGmoQyUQ088Ca4ouvEFDPoWpRO4oQ/view)

it occurred to me that using MCTS in this setup is actually such
a kludge!

  Originally, we used MCTS because with the repeated simulations,
we would be improving the accuracy of the arm reward estimates.  MCTS
policies assume stationary distributions, which is violated every time
we expand the tree, but it's an okay tradeoff if all you feed into the
tree are rewards in the form of just Bernoulli trials.  Moreover, you
could argue evaluations are somewhat monotonic with increasing node
depths as you are basically just fixing a growing prefix of the MC
simulation.

  But now, we expand the nodes literally all the time, breaking the
stationarity possibly in drastic ways.  There are no reevaluations that
would improve your estimate.  The input isn't binary but an estimate in
a continuous space.  Suddenly the Multi-armed Bandit analogy loses a lot
of ground.

  Therefore, can't we take the next step, and do away with MCTS?  Is
there a theoretical viewpoint from which it still makes sense as the best
policy improvement operator?

  What would you say is the current state-of-art game tree search for
chess?  That's a very unfamiliar world for me, to be honest all I really
know is MCTS...

-- 
                                        Petr Baudis, Rossum
        Run before you walk! Fly before you crawl! Keep moving forward!
        If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to