Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Brian Sheppard via Computer-go Wed, 06 Dec 2017 15:19:59 -0800

The chess result is 64-36: a 100 rating point edge! I think the Stockfish open 
source project improved Stockfish by ~20 rating points in the last year. Given 
the number of people/computers involved, Stockfish’s annual effort level seems 
comparable to the AZ effort.


 

Stockfish is really, really tweaked out to do exactly what it does. It is very 
hard to improve anything about Stockfish. To be clear: I am not disparaging the 
code or people or project in any way. The code is great, people are great, 
project is great. It is really easy to work on Stockfish, but very hard to make 
progress given the extraordinarily fine balance of resources that already 
exists.  I tried hard for about 6 months last year without any successes. I 
tried dozens (maybe 100?) experiments, including several that were motivated by 
automated tuning or automated searching for opportunities. No luck.

 

AZ would dominate the current TCEC. Stockfish didn’t lose a game in the 
semi-final, failing to make the final because of too many draws against the 
weaker players.

 

The Stockfish team will have some self-examination going forward for sure. I 
wonder what they will decide to do.

 

I hope this isn’t the last we see of these DeepMind programs.

 

From: Computer-go [mailto:[email protected]] On Behalf Of 
Richard Lorentz
Sent: Wednesday, December 6, 2017 12:50 PM
To: [email protected]
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a 
General Reinforcement Learning Algorithm

 

One chess result stood out for me, namely, just how much easier it was for 
AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with 
black (3 wins, 47 draws, 0 losses).

Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:

Hi, 

DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. 

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning 
Algorithm 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=>
 
&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=
 

AlphaZero(Chess) outperformed Stockfish after 4 hours, 
AlphaZero(Shogi) outperformed elmo after 2 hours. 

Search is MCTS. 
AlphaZero(Chess) searches     80,000 positions/sec. 
Stockfish        searches 70,000,000 positions/sec. 
AlphaZero(Shogi) searches     40,000 positions/sec. 
elmo             searches 35,000,000 positions/sec. 

Thanks, 
Hiroshi Yamashita 

_______________________________________________ 
Computer-go mailing list 
[email protected] <mailto:[email protected]>  
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo
 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=>
 
&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=

_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Reply via email to