I've recently viewed the paper of AlphaGo, which has done gradient-based reinforcement learning to get stronger. The learning was successful enough to beat a human master, but in this case, supervised learning with a large database of master level human games was preceded the reinforcement learning. For a complex enough game as go, one can expect that the search space for the policy function would not be smooth at all. So supposedly supervised learning was necessary to guide the policy function to a good starting point before reinforcement. Without such, applying reinforcement learning directly to a random policy can easily make the policy stuck at a bad local optimum. I could have a miunderstanding at this point; correct me if so, but to continue on: if it is hard to have "the good starting point" such as a trained policy from human expert game records, what is a way to devise one. I've had a look on NEAT and HyperNEAT, which are evolutionary methods. Do these evolutionary algorithms scale well on complex strategic decision processes and not just on simple linear decisions such as food gathering and danger avoidance? In case not, what alternatives are known? Is there any success case of a chess, go, or any kind of complex strategic game playing algorithm, where it gained expert strength without domain knowledge such as expert game examples?
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go