Re: [Computer-go] Training the value network (a possibly more efficient approach)

Gian-Carlo Pascutto Thu, 12 Jan 2017 02:05:36 -0800

On 11-01-17 18:09, Xavier Combelle wrote:
> Of course it means distribute at least the binary so, or the source,
> so proprietary software could be reluctant to share it. But for free
> software there should not any problem. If someone is interested by my
> proposition, I would be pleased to realize it.


It is obvious that having a 30M dataset of games between strong players
(i.e. replicating the AlphaGo training set) would be beneficial to the
community. It is clear that most of us are trying to do the same now,
that is somehow trying to learn a value function from the about ~1.5M
KGS+Tygen+GoGoD games while trying to control overfitting via various
measures. (Aya used small network + dropout. Rn trained multiple outputs
on a network of unknown size. I wonder why no-one tried normal L1/L2
regularization, but then I again I didn't get that working either!)

Software should also not really be a problem: Leela is free, Ray and
Darkforest are open source. If we can use a pure DCNN player I think
there are several more options, for example I've seen several programs
in Python. You can resolve score disagreement by invoking GNU Go --score
aftermath.

I think it's an open question though, *how* the games should be
generated, i.e.:

* Follow AlphaGo procedure but with SL instead of RL player (you can use
bigger or smaller networks too, many tradeoffs possible)
* Play games with full MCTS search and small number of playouts. (More
bias, much higher quality games).
* The author of Aya also stated his procedure.
* Several of those and mix :-)

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to