Re: [Computer-go] mini-max with Policy and Value network

valkyria Tue, 23 May 2017 12:13:08 -0700

(3) CNN cannot learn exclusive-or function due to the ReLU
activation function, instead of traditional sigmoid (tangent
hyperbolic).  CNN is good at approximating continuous (analog)
functions but Boolean (digital) ones.


Are you sure about that? I can imagine using two ReLU units to
construct a sigmoid-like step function, so I'd think a multi-layer net
should be fine (just like with ordinary perceptrons).

No, this is incorrect. A perceptron (a single layer neural network)cannot do XOR.The whole point of 2+ layer networks was to overcome this basicweakness. A two layer network with infinite number of neurons in thelayers can approximate any function.

But early on it turned out that learning was unstable and-or extremelyslow for multilayer networks so the theoretical capacity was notpractical.

Now with deep learning we know that with correct training, a lot of dataand hardware (or patience) neural networks can learn almost anything.

It is probably correct that smooth functions are easier to approximatewith a neural network, than high dimensional non-continuous functions.

I am training my networks on a single CPU thread so I have the benefitof following the learning process of NNOdin slowly. I have seen a lot ofproblems with the network but after some weeks of training they go away.It is interesting to see how its playing style changes. For a while itwould rigididly play very local shapes but now it seems to start to takelie and death of large groups into account. Or maybe it lets the MCplayout have more impact on the decisions made, by searching moreeffectively. Some weeks ago it would barely win against gnugo, and itwon by just playing standard shapes until it got lucky. In the lastcouple of days it seems to surround and cut off gnugo's groups and killthem big as a strong player would.

So what do I want to say. So far i learned that the policy network willblindly play whatever shapes it finds good and ignore most alternativemoves. So there is indeed a huge problem of "holes" in the policyfunction. But for Odin at least I do not know which holes will be aproblem as the network matures with more learning. My plan is then tofix holes by making the MC evaluation strong.


Best
Magnus
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to