Requiring a margin > 55% is a defense against a random result. A 55% score in a 
400-game match is 2 sigma.

But I like the AZ policy better, because it does not require arbitrary 
parameters. It also improves more fluidly by always drawing training examples 
from the current probability distribution, and when the program is close to 
perfect you would be able to capture the lest 5% of skill.

I am not sure what to make of the AZ vs AGZ result. Mathematically, there 
should be a degree of training sufficient for AZ to exceed any fixed level of 
skill, such as AGZ's 40/40 level. So there must be a reason why DeepMind did 
not report such a result, but it unclear what that is.

-----Original Message-----
From: Computer-go [mailto:[email protected]] On Behalf Of 
Darren Cook
Sent: Wednesday, December 6, 2017 12:58 PM
To: [email protected]
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

> Mastering Chess and Shogi by Self-Play with a General Reinforcement 
> Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update the 
neural net, rather than require a new network to beat it 55% of the time to be 
used. (That struck me as strange at the time, when reading the AlphaGoZero 
paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing 
to the 20-block, 3-day version. Not the 40-block, 40-day version that was even 
stronger.

As papers rarely show failures, can we take it to mean they couldn't 
out-perform their best go bot, do you think? If so, I wonder how hard they 
tried?

In other words, do you think the changes they made from AlphaGo Zero to Alpha 
Zero have made it weaker (when just viewed from the point of view of making the 
strongest possible go program).

Darren
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to