[Computer-go] dealing with multiple local optima

2017-02-24 Thread Minjae Kim
I've recently viewed the paper of AlphaGo, which has done gradient-based reinforcement learning to get stronger. The learning was successful enough to beat a human master, but in this case, supervised learning with a large database of master level human games was preceded the reinforcement

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
Neural networks always have a lot of local optima. Simply because they have a high degree of internal symmetry. That is, you can “permute” sets of coefficients and get the same function. Don’t think of starting with expert training as a way to avoid local optima. It is a way to start

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Minjae Kim
But those video games have a very simple optimal policy. Consider Super Mario: if you see an enemy, step on it; if you see a whole, jump over it; if you see a pipe sticking up, also jump over it; etc. On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook wrote: > > ...if it is hard to

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
OK, so let’s talk theory vs practice. In theory, TD learning approaches asymptotic optimality when used with a full state space. That is, if your RL model has one parameter for each state, then TD will converge those parameters to the game theoretic values. There are some pre-conditions,

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Darren Cook
> ...if it is hard to have "the good starting point" such as a trained > policy from human expert game records, what is a way to devise one. My first thought was to look at the DeepMind research on learning to play video games (which I think either pre-dates the AlphaGo research, or was done in

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread terry mcintyre via Computer-go
"seeing" is complex when the input is just a bunch of pixels.  Terry McIntyre Unix/Linux Systems Administration Taking time to do it right saves having to do it twice. On Friday, February 24, 2017 12:32 PM, Minjae Kim wrote: But those video games have a very simple

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Álvaro Begué
I should point out that Reinforcement Learning is a relatively unimportant part of AlphaGo, according to the paper. They only used it to turn the move-prediction network into a stronger player (presumably increasing the weights of the layer before SoftMax would do most of the job, by making the

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Jim O'Flaherty
NEAT and hyperNEAT are awesome when "evolving" fairly simple networks with a very limited number of input and output dimensions. However, without access to some serious computational power, scaling the NEAT method up to the kind of level you would need for the current encoding methods for the

Re: [Computer-go] UEC wild cards?

2017-02-24 Thread Gian-Carlo Pascutto
On 21/02/2017 16:11, "Ingo Althöfer" wrote: > Dear UEC organizers, > > GCP wrote (on behalf of Leela): >> I did not register for the UEC Cup. I seem to be in good company there, >> sadly. > > do you have a few wild cards for strong late entries? Posting on behalf of the UEC organizers: Yes,