Neural networks always have a lot of local optima. Simply because they have a 
high degree of internal symmetry. That is, you can “permute” sets of 
coefficients and get the same function.

 

Don’t think of starting with expert training as a way to avoid local optima. It 
is a way to start training with some good examples so that learning can start 
at a higher level. But then you should continue with self-play.

 

Backgammon was trained to expert level based on self-play games that were 
initially random. Google “Tesauro backgammon” and you should be able to find a 
paper.

 

I don’t know NEAT and HyperNEAT; I will look them up. Thank you for the 
reference.

 

Best,

Brian

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Minjae Kim
Sent: Friday, February 24, 2017 3:39 AM
To: computer-go@computer-go.org
Subject: [Computer-go] dealing with multiple local optima

 

I've recently viewed the paper of AlphaGo, which has done gradient-based 
reinforcement learning to get stronger. The learning was successful enough to 
beat a human master, but in this case, supervised learning with a large 
database of master level human games was preceded the reinforcement learning. 
For a complex enough game as go, one can expect that the search space for the 
policy function would not be smooth at all. So supposedly supervised learning 
was necessary to guide the policy function to a good starting point before 
reinforcement. Without such, applying reinforcement learning directly to a 
random policy can easily make the policy stuck at a bad local optimum. I could 
have a miunderstanding at this point; correct me if so, but to continue on: if 
it is hard to have "the good starting point" such as a trained policy from 
human expert game records, what is a way to devise one. I've had a look on NEAT 
and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms 
scale well on complex strategic decision processes and not just on simple 
linear decisions such as food gathering and danger avoidance? In case not, what 
alternatives are known? Is there any success case of a chess, go, or any kind 
of complex strategic game playing algorithm, where it gained expert strength 
without domain knowledge such as expert game examples?

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to