Re: [Computer-go] MuZero - new paper from DeepMind

2019-11-25 Thread Brian Sheppard via Computer-go
I read through that paper, but I admit that I didn't really get where the extra power comes from. -Original Message- From: valkyria To: Computer-go Sent: Mon, Nov 25, 2019 6:41 am Subject: [Computer-go] MuZero - new paper from DeepMind Hi, if anyone still get email from this list:

Re: [Computer-go] Indexing and Searching Go Positions -- Literature Wanted

2019-09-17 Thread Brian Sheppard via Computer-go
I remember a scheme (from Dave Dyer, IIRC) that indexed positions based on the points on which the 20th, 40th, 60th,... moves were made. IIRC it was nearly a unique key for pro positions. Best,Brian -Original Message- From: Erik van der Werf To: computer-go Sent: Tue, Sep 17, 2019

Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-08 Thread Brian Sheppard via Computer-go
>> contrary to intuition built up from earlier-generation MCTS programs in Go, >> putting significant weight on score maximization rather than only >> win/loss seems to help. This narrative glosses over important nuances. Collectively we are trying to find the golden mean of cost efficiency...

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
Thanks for the explanation. I agree that there is no actual consistency in exploration terms across historical papers. I confirmed that the PUCT formulas across the AG, AGZ, and AZ papers are all consistent. That is unlikely to be an error. So now I am wondering whether the faster decay is

Re: [Computer-go] PUCT formula

2018-03-09 Thread Brian Sheppard via Computer-go
18 4:48 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] PUCT formula On 08-03-18 18:47, Brian Sheppard via Computer-go wrote: > I recall that someone investigated this question, but I don’t recall > the result. What is the formula that AGZ actually uses? The one mentioned in their

[Computer-go] PUCT formula

2018-03-08 Thread Brian Sheppard via Computer-go
In the AGZ paper, there is a formula for what they call “a variant of the PUCT algorithm”, and they cite a paper from Christopher Rosin: http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf But that paper has a formula that he calls the PUCB formula, which incorporates the priors

Re: [Computer-go] On proper naming

2018-03-08 Thread Brian Sheppard via Computer-go
The technique originated with backgammon players in the late 1970's, who would roll out positions manually. Ron Tiekert (Scrabble champion) also applied the technique to Scrabble, and I took that idea for Maven. It seemed like people were using the terms interchangeably. -Original

Re: [Computer-go] 9x9 is last frontier?

2018-03-07 Thread Brian Sheppard via Computer-go
s attempt at Shogi using AlphaZero's method will turnout. regards, Daniel On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: Training on Stockfish games is guarant

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: Training on Stockfish games is guaranteed to produce a blunder-fest, because there are no blunders in the training set and therefore the policy networ

Re: [Computer-go] 9x9 is last frontier?

2018-03-06 Thread Brian Sheppard via Computer-go
Training on Stockfish games is guaranteed to produce a blunder-fest, because there are no blunders in the training set and therefore the policy network never learns how to refute blunders. This is not a flaw in MCTS, but rather in the policy network. MCTS will eventually search every move

Re: [Computer-go] Project Leela Zero

2017-12-29 Thread Brian Sheppard via Computer-go
Seems like extraordinarily fast progress. Great to hear that. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of "Ingo Althöfer" Sent: Friday, December 29, 2017 12:30 PM To: computer-go@computer-go.org Subject: [Computer-go] Project Leela Zero

Re: [Computer-go] AGZ Policy Head

2017-12-29 Thread Brian Sheppard via Computer-go
I agree that having special knowledge for "pass" is not a big compromise, but it would not meet the "zero knowledge" goal, no? -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Rémi Coulom Sent: Friday, December 29, 2017 7:50 AM To:

Re: [Computer-go] mcts and tactics

2017-12-19 Thread Brian Sheppard via Computer-go
>I wouldn't find it so surprising if eventually the 20 or 40 block networks >develop a set of convolutional channels that traces possible ladders >diagonally across the board. Learning the deep tactics is more-or-less guaranteed because of the interaction between search and evaluation

Re: [Computer-go] AlphaZero & Co. still far from perfect play ?

2017-12-08 Thread Brian Sheppard via Computer-go
Agreed. You can push this farther. If we define an “error” as a move that flips the W/L state of a Go game, then only the side that is currently winning can make an error. Let’s suppose that 6.5 komi is winning for Black. Then Black can make an error, and after he does then White can make

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
AZ scalability looks good in that diagram, and it is certainly a good start, but it only goes out through 10 sec/move. Also, if the hardware is 7x better for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the curve for SF by 7x? Or some combination? Or take the data at

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
@computer-go.org] On Behalf Of Gian-Carlo Pascutto Sent: Thursday, December 7, 2017 8:17 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm On 7/12/2017 13:20, Brian Sheppard via Computer

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-07 Thread Brian Sheppard via Computer-go
, December 7, 2017 4:13 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm On 06-12-17 22:29, Brian Sheppard via Computer-go wrote: > The chess result is 64-36: a 100 rating point edge! I th

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
Requiring a margin > 55% is a defense against a random result. A 55% score in a 400-game match is 2 sigma. But I like the AZ policy better, because it does not require arbitrary parameters. It also improves more fluidly by always drawing training examples from the current probability

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
The chess result is 64-36: a 100 rating point edge! I think the Stockfish open source project improved Stockfish by ~20 rating points in the last year. Given the number of people/computers involved, Stockfish’s annual effort level seems comparable to the AZ effort. Stockfish is really,

Re: [Computer-go] Significance of resignation in AGZ

2017-12-03 Thread Brian Sheppard via Computer-go
than zero, or is it imitating the best known algorithm inconvenient for your purposes? Best, -Chaz On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: I implemented the ad hoc rule of not training on pos

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go
be not to resign to early (even before not passing) Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit : I have some hard data now. My network’s initial training reached the same performance in half the iterations. That is, the steepness of skill gain in the first day of training

Re: [Computer-go] Significance of resignation in AGZ

2017-12-02 Thread Brian Sheppard via Computer-go
uter-go] Significance of resignation in AGZ Brian, do you have any experiments showing what kind of impact it has? It sounds like you have tried both with and without your ad hoc first pass approach? 2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go <computer-go@computer-go

Re: [Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go
15:29 GMT-06:00 Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> >: I have concluded that AGZ's policy of resigning "lost" games early is somewhat significant. Not as significant as using residual networks, for sure,

[Computer-go] Significance of resignation in AGZ

2017-12-01 Thread Brian Sheppard via Computer-go
I have concluded that AGZ's policy of resigning "lost" games early is somewhat significant. Not as significant as using residual networks, for sure, but you wouldn't want to go without these advantages. The benefit cited in the paper is speed. Certainly a factor. I see two other advantages.

Re: [Computer-go] Is MCTS needed?

2017-11-16 Thread Brian Sheppard via Computer-go
State of the art in computer chess is alpha-beta search, but note that the search is very selective because of "late move reductions." A late move reduction is to reduce depth for moves after the first move generated in a node. For example, a simple implementation would be "search the first

Re: [Computer-go] Zero is weaker than Master!?

2017-10-26 Thread Brian Sheppard via Computer-go
I would add that "wild guesses based on not enough info" is an indispensable skill. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Hideki Kato Sent: Thursday, October 26, 2017 10:17 AM To: computer-go@computer-go.org Subject: Re:

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go
On Behalf Of Robert Jasiek Sent: Thursday, October 26, 2017 10:17 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright? On 26.10.2017 13:52, Brian Sheppard via Computer-go wrote: > MCTS is the glue that binds incompatible rules. This is, howeve

Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?

2017-10-26 Thread Brian Sheppard via Computer-go
Robert is right, but Robert seems to think this hasn't been done. Actually every prominent non-neural MCTS program since Mogo has been based on the exact design that Robert describes. The best of them achieve somewhat greater strength than Robert expects. MCTS is the glue that binds

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-25 Thread Brian Sheppard via Computer-go
I think it uses the champion network. That is, the training periodically generates a candidate, and there is a playoff against the current champion. If the candidate wins by more than 55% then a new champion is declared. Keeping a champion is an important mechanism, I believe. That creates

Re: [Computer-go] AlphaGo Zero

2017-10-19 Thread Brian Sheppard via Computer-go
So I am reading that residual networks are simply better than normal convolutional networks. There is a detailed write-up here: https://blog.waya.ai/deep-residual-learning-9610bb62c355 Summary: the residual network has a fixed connection that adds (with no scaling) the output of the previous

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
-Carlo Pascutto Sent: Wednesday, October 18, 2017 5:40 PM To: computer-go@computer-go.org Subject: Re: [Computer-go] AlphaGo Zero On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote: > This paper is required reading. When I read this team’s papers, I > think to myself “Wow, this is br

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
Some thoughts toward the idea of general game-playing... One aspect of Go is ideally suited for visual NN: strong locality of reference. That is, stones affect stones that are nearby. I wonder whether the late emergence of ladder understanding within AlphaGo Zero is an artifact of the board

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
October 18, 2017 4:38 PM To: computer-go@computer-go.org Subject: Re: [Computer-go] AlphaGo Zero On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote: > A stunning result. The NN uses a standard vision architecture (no Go > adaptation beyond what is necessary to represent the game state).

Re: [Computer-go] AlphaGo Zero

2017-10-18 Thread Brian Sheppard via Computer-go
This paper is required reading. When I read this team’s papers, I think to myself “Wow, this is brilliant! And I think I see the next step.” When I read their next paper, they show me the next *three* steps. I can’t say enough good things about the quality of the work. A stunning result.

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
ecking whether each candidate satisfies the problem's statement." The whole point of the policy network is to avoid brute-force search, by reducing the branching factor... On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@co

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
it would still depend on the actual details of its implementation). On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: I understand why most people are saying that AlphaGo is not brute force

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
2017 at 2:20 PM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: I understand why most people are saying that AlphaGo is not brute force, because it appears to be highly selective. But MCTS is a full width search. Read th

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: Yes, AlphaGo is brute force. No it is impossible to solve Go. Perfect play looks a lot like AlphaGo in that you would not be able to tell the differen

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Brian Sheppard via Computer-go
Yes, AlphaGo is brute force. No it is impossible to solve Go. Perfect play looks a lot like AlphaGo in that you would not be able to tell the difference. But I think that AlphaGo still has 0% win rate against perfect play. My own best guess is that top humans make about 12 errors per game.

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread Brian Sheppard via Computer-go
>I haven't tried it, but (with the computer chess hat on) these kind of >proposals behave pretty badly when you get into situations where your >evaluation is off and there are horizon effects. In computer Go, this issue focuses on cases where the initial move ordering is bad. It isn't so much

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread Brian Sheppard via Computer-go
Yes. This is a long-known phenomenon. I was able to get improvements in Pebbles based on the idea of forgetting unsuccessful results. It has to be done somewhat carefully, because results propagate up the tree. But you can definitely make it work. I recall a paper published on this

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Brian Sheppard via Computer-go
>... my value network was trained to tell me the game is balanced at the >beginning... :-) The best training policy is to select positions that correct errors. I used the policies below to train a backgammon NN. Together, they reduced the expected loss of the network by 50% (cut the error

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Brian Sheppard via Computer-go
er-go.org] On Behalf Of Gian-Carlo Pascutto Sent: Monday, May 22, 2017 4:08 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] mini-max with Policy and Value network On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote: > Could use late-move reductions to eliminate the hard pruning. Giv

Re: [Computer-go] mini-max with Policy and Value network

2017-05-20 Thread Brian Sheppard via Computer-go
Could use late-move reductions to eliminate the hard pruning. Given the accuracy rate of the policy network, I would guess that even move 2 should be reduced. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Hiroshi Yamashita Sent:

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go
a "pay additional attention here and here and...". On Apr 18, 2017 6:31 AM, "Brian Sheppard via Computer-go" <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: Adding patterns is very cheap: encode the patterns as an if/else tree, and

Re: [Computer-go] Patterns and bad shape

2017-04-18 Thread Brian Sheppard via Computer-go
Adding patterns is very cheap: encode the patterns as an if/else tree, and it is O(log n) to match. Pattern matching as such did not show up as a significant component of Pebbles. But that is mostly because all of the machinery that makes pattern-matching cheap (incremental updating of 3x3

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
Brian Sheppard via Computer-go <computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote: Neural networks always have a lot of local optima. Simply because they have a high degree of internal symmetry. That is, you can “permute” sets of coefficients and get

Re: [Computer-go] dealing with multiple local optima

2017-02-24 Thread Brian Sheppard via Computer-go
Neural networks always have a lot of local optima. Simply because they have a high degree of internal symmetry. That is, you can “permute” sets of coefficients and get the same function. Don’t think of starting with expert training as a way to avoid local optima. It is a way to start

Re: [Computer-go] Playout policy optimization

2017-02-12 Thread Brian Sheppard via Computer-go
If your database is composed of self-play games, then the likelihood maximization policy should gain strength rapidly, and there should be a way to have asymptotic optimality. (That is, the patterns alone will play a perfect game in the limit.) Specifically: play self-play games using an

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go
understand how others improve their rollouts. I hope that i will be able to improve the rollouts at some point. Roel On 31 January 2017 at 17:21, Brian Sheppard via Computer-go <computer-go@computer-go.org> wrote: If a "diamond" pattern is centered on a 5x5 square, then you

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-31 Thread Brian Sheppard via Computer-go
tions are you referring to? m.v.g. Roel On 24 January 2017 at 12:57, Brian Sheppard via Computer-go <computer-go@computer-go.org> wrote: There are two issues: one is the shape and the other is the policy that the search should follow. Of course the vital point is a killing mo

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-24 Thread Brian Sheppard via Computer-go
computer-go.org Subject: Re: [Computer-go] AlphaGo rollout nakade patterns? On 23-01-17 20:10, Brian Sheppard via Computer-go wrote: > only captures of up to 9 stones can be nakade. I don't really understand this. http://senseis.xmp.net/?StraightThree Both constructing this shape and playing

Re: [Computer-go] AlphaGo rollout nakade patterns?

2017-01-23 Thread Brian Sheppard via Computer-go
A capturing move has a potential nakade if the string that was removed is among a limited set of possibilities. Probably Alpha Go has a 13-point bounding region (e.g., the 13-point star) that it uses as a positional index, and therefore a 8192-sized pattern set will identify all potential