Thanks for sharing the link. Taking a brief look at this paper, I'm quite
confused about their methodology and their interpretation of their data.

For example in figure 2 (b), if I understand correctly, they plot Elo
ratings for three independent runs where they run the entire AlphaZero
process for 50,100,150 iterations, where they seem to be using the word
"iteration" to mean an entire block of playing a fixed number of games
("episodes"), training the neural net, and then testing the net to see if
it should replace the previous.

Their plot of Elo ratings however shows the 50 iterations run starting much
higher and ending much lower than the 100, which starts much higher and
ends much lower than the 150. What stands out is that each of the three
independently seems to be mean 0. Does this mean that for every run, they
only computed Elos using games between nets within that run itself, with no
games comparing nets across separate runs? If so, this makes every Elo
graph in the paper tricky to interpret, since none of the values in any of
them are directly comparable between lines. The ones that span a wider
range are likely to be better runs (more Elo improvement within that run),
but since nontransitivity effects can sometimes lead to some dilations or
contractions of the apparent Elo gain versus the "true" gain against more
general opponents, without cross-run games it's hard to be entirely
confident about the comparisons.

They also seem to imply in the text that the bump in training loss near the
end of the 150 iteration run in Figure 2 (a) indicates that the neural net
worsened, and that more iterations may make the bot worse. This seems to me
a strange conclusion. Their own graph shows that actually the relative Elo
strength within that run almost monotonely increased through that whole
period. Since the AlphaZero process trains towards a moving target, it's
easy for the loss to increase simply due to the data getting harder even if
the neural net always improves - for example maybe the most common opening
changes from a simple one to one that leads to games that are complex and
harder to predict, even if the neural net improves its strength and
accuracy in both openings the whole time.

On Sun, Mar 24, 2019 at 10:05 AM Rémi Coulom <> wrote:

> Hi,
> Here is a paper you might be interested in:
> Abstract:
> Since AlphaGo and AlphaGo Zero have achieved breakground successes in the
> game of Go, the programs have been generalized to solve other tasks.
> Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the
> literature, the algorithms are explained well. However, AlphaZero contains
> many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there
> is sufficient discussion about how to set parameter values in these
> algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero
> and evaluate how these parameters contribute to training. We focus on three
> objectives~(training loss, time cost and playing strength). For each
> parameter, we train 3 models using 3 different values~(minimum value,
> default value, maximum value). We use the game of play 6×6 Othello, on the
> AlphaZeroGeneral open source re-implementation of AlphaZero. Overall,
> experimental results show that different values can lead to different
> training results, proving the importance of such a parameter sweep. We
> categorize these 12 parameters into time-sensitive parameters and
> time-friendly parameters. Moreover, through multi-objective analysis, this
> paper provides an insightful basis for further hyper-parameter optimization.
> Rémi
> _______________________________________________
> Computer-go mailing list
Computer-go mailing list

Reply via email to