Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread uurtamo .
It's interesting to leave unused parameters or unnecessary parameterizations in the paper. It telegraphs what was being tried as opposed to simply writing something more concise and leaving the reader to wonder why and how those decisions were made. s. On Nov 7, 2017 10:54 PM, "Imran Hendley"

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Imran Hendley
Great, thanks guys! On Tue, Nov 7, 2017 at 1:51 PM, Gian-Carlo Pascutto wrote: > On 7/11/2017 19:07, Imran Hendley wrote: > > Am I understanding this correctly? > > Yes. > > It's possible they had in-betweens or experimented with variations at > some point, then settled on the

Re: [Computer-go] AlphaGo Zero Loss

2017-11-07 Thread Wesley Turner
I can only speculate, but I see two advantages to using MSE: * MSE accomodates games that have more than just win/loss. One of AlphaGo Zero's goals (I'm extrapolating from the paper) was to develop a system that was easy to apply to domains other than go. * It can be used with

[Computer-go] Did some zero project already show improvement over random moves

2017-11-07 Thread Xavier Combelle
I wonder if some of zero project (project based on alphago zero paper) that if I understood well was launched did already had gather some kind of mesurable succeed, even very only of the order of hundreds points. If I understand correctly, the previous mails, the computation power you have is 1700

Re: [Computer-go] AlphaGo Zero Loss

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:08, Petr Baudis wrote: > Hi! > > Does anyone knows why the AlphaGo team uses MSE on [-1,1] as the > value output loss rather than binary crossentropy on [0,1]? I'd say > the latter is way more usual when training networks as typically > binary crossentropy yields better result,

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:07, Imran Hendley wrote: > Am I understanding this correctly? Yes. It's possible they had in-betweens or experimented with variations at some point, then settled on the simplest case. You can vary the randomness if you define it as a softmax with varying temperature, that's

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread uurtamo .
If I understand your question correctly, "goes to 1" can happen as quickly or slowly as you'd like. Yes? On Nov 7, 2017 7:26 PM, "Imran Hendley" wrote: Hi, I might be having trouble understanding the self-play policy for AlphaGo Zero. Can someone let me know if I'm on

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Álvaro Begué
Your understanding matches mine. My guess is that they had a temperature parameter in the code that would allow for things like slowly transitioning from random sampling to deterministically picking the maximum, but they ended up using only those particular values. Álvaro. On Tue, Nov 7, 2017

[Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Imran Hendley
Hi, I might be having trouble understanding the self-play policy for AlphaGo Zero. Can someone let me know if I'm on the right track here? The paper states: In each position s, an MCTS search is executed, guided by the neural network f_θ . The MCTS search outputs probabilities π of playing each

[Computer-go] Yet another Zero project.

2017-11-07 Thread valkyria
Hi, I also have some Zero stuff brewing since almost two days. Although I depend on heavy playout MC-evaluation for self play. I am using my Odin MC-engine as a base as it is. It can use a small AG style policy network running on CPU implemented with Eigen (C++). It does not have any value