Very nice. :)
And thanks for the note about batch sizing. Specifically tuning parameters
for this level of strength on 9x9 seems like it could be quite valuable,
Kata definitely hasn't done that either.

But it feels like bots are very very close to optimal on 9x9. With some
dedicated work, more months or years of training, it might be possible to
reach unbeatable, for all practical purposes, and as you mentioned in the
other thread, adaptively building out an opening book could be a part of
getting.there - I'd love to see an "unbeatable 9x9 crazystone" a year down
the line.

One fundamental issue that I've been noticing in a variety of domains is
precisely that self-play under AlphaZero and generally reinforcement
learning in environments like these doesn't explore enough, and it's very,
very difficult to get it to do so in a way that's still robust and
efficient enough. And unless you plan to do something like AlphaStar's
internal self-play training league, which would seem to nontrivally
multiply the cost, it seems like playing other opponents instead of just
selfplay can't entirely be the solution... because once you reach enough
better than the best other opponent, it's hard to usefully continue doing
that. And the league *still* didn't entirely fix the problem for AlphaStar,
in that humans were still able to sometimes find exploitative strategies
that it hadn't learned any idea of how to handle via selfplay, and reacted
very poorly to. It feels like there's something unsolved and "missing" from
current algorithms.


On Sat, May 9, 2020 at 7:17 PM Rémi Coulom <remi.cou...@gmail.com> wrote:

> Yeaaaah! first win against Kata!
> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/09/999849.sgf
>
> In addition to the optimized batch size, I did two other things:
>  - I use two batches of 63 instead of one, with double buffering, so that
> the GPU is kept 100% busy. About 14k nodes per second now.
>  - I make the search less selective, by using a bigger exploration
> constant in the MCTS formula.
> I should download Katago and CLOP my search parameters against it.
>
> So far I have tried to keep the "Zero" philosophy of using self-play only,
> but playing against other opponents is very likely to be a better approach
> at making progress.
>
> Rémi
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to