Hi Ichikawa san,

Thank you for nice explanation. I think your guess is maybe right.
And 2018 nature paper might have no mistake.

I had checked carefully both Figure 1.

1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step.
2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps.
3. Final strength is similar.

So I had thought "If you use 7 times games record, initial learning speed is 
 but final strength is similar.".
So maybe they want to say "21 million Training Games is enough."

But it is wrong.
In Go, if you use all positions from a game, it makes overfitting? And learning 
will fail?
Without symmery-augmented, Go can use only 20 positions from a game.
Chess and Shogi is ok. It looks like domain dependent...

Hiroshi Yamashita

Go version in AlphaZero 2017 finished the training in 34 hours according to 
Table S3.
And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training 
in the same time according to Figure S1.
So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by 
mistake and retried the experiment again in 2018 paper.
In order to compensate symmetries with real self-plays, they generated 8 times 
more games and reduced positions per game to 1/8.
It is just my guess^^
Computer-go mailing list

Reply via email to