[Computer-go] Accelerating Self-Play Learning in Go

David Wu Sun, 03 Mar 2019 18:21:58 -0800

For any interested people on this list who don't follow Leela Zero
discussion or reddit threads:


I recently released a paper on ways to improve the efficiency of
AlphaZero-like learning in Go. A variety of the ideas tried deviate a
little from "pure zero" (e.g. ladder detection, predicting board
ownership), but still only uses self-play starting from random and with no
outside human data.

Although longer training runs have NOT yet been tested, for reaching up to
about LZ130 strength so far (strong human pro or just beyond it, depending
on hardware), you can speed up the learning to that point by roughly a
factor of 5 at least compared to Leela Zero, and closer to a factor of 30
for merely reaching the earlier level of very strong amateur strength
rather than pro or superhuman.

I found some other interesting results, too - for example contrary to
intuition built up from earlier-generation MCTS programs in Go, putting
significant weight on score maximization rather than only win/loss seems to
help.

Blog post:
https://blog.janestreet.com/accelerating-self-play-learning-in-go/
Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Accelerating Self-Play Learning in Go

Reply via email to