Thanks for sharing the link. Taking a brief look at this paper, I'm quite confused about their methodology and their interpretation of their data.

For example in figure 2 (b), if I understand correctly, they plot Elo ratings for three independent runs where they run the entire AlphaZero process for 50,100,150 iterations, where they seem to be using the word "iteration" to mean an entire block of playing a fixed number of games ("episodes"), training the neural net, and then testing the net to see if it should replace the previous. Their plot of Elo ratings however shows the 50 iterations run starting much higher and ending much lower than the 100, which starts much higher and ends much lower than the 150. What stands out is that each of the three independently seems to be mean 0. Does this mean that for every run, they only computed Elos using games between nets within that run itself, with no games comparing nets across separate runs? If so, this makes every Elo graph in the paper tricky to interpret, since none of the values in any of them are directly comparable between lines. The ones that span a wider range are likely to be better runs (more Elo improvement within that run), but since nontransitivity effects can sometimes lead to some dilations or contractions of the apparent Elo gain versus the "true" gain against more general opponents, without cross-run games it's hard to be entirely confident about the comparisons. They also seem to imply in the text that the bump in training loss near the end of the 150 iteration run in Figure 2 (a) indicates that the neural net worsened, and that more iterations may make the bot worse. This seems to me a strange conclusion. Their own graph shows that actually the relative Elo strength within that run almost monotonely increased through that whole period. Since the AlphaZero process trains towards a moving target, it's easy for the loss to increase simply due to the data getting harder even if the neural net always improves - for example maybe the most common opening changes from a simple one to one that leads to games that are complex and harder to predict, even if the neural net improves its strength and accuracy in both openings the whole time. On Sun, Mar 24, 2019 at 10:05 AM Rémi Coulom <remi.cou...@free.fr> wrote: > Hi, > > Here is a paper you might be interested in: > > Abstract: > > Since AlphaGo and AlphaGo Zero have achieved breakground successes in the > game of Go, the programs have been generalized to solve other tasks. > Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the > literature, the algorithms are explained well. However, AlphaZero contains > many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there > is sufficient discussion about how to set parameter values in these > algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero > and evaluate how these parameters contribute to training. We focus on three > objectives~(training loss, time cost and playing strength). For each > parameter, we train 3 models using 3 different values~(minimum value, > default value, maximum value). We use the game of play 6×6 Othello, on the > AlphaZeroGeneral open source re-implementation of AlphaZero. Overall, > experimental results show that different values can lead to different > training results, proving the importance of such a parameter sweep. We > categorize these 12 parameters into time-sensitive parameters and > time-friendly parameters. Moreover, through multi-objective analysis, this > paper provides an insightful basis for further hyper-parameter optimization. > > https://arxiv.org/abs/1903.08129 > > Rémi > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go