[Computer-go] Ladder and zeroness (was: Re: CGOS source on github)

2021-01-23 Thread David Wu
On Sat, Jan 23, 2021 at 5:34 AM Darren Cook  wrote:

> Each convolutional layer should spread the information across the board.
> I think alpha zero used 20 layers? So even 3x3 filters would tell you
> about the whole board - though the signal from the opposite corner of
> the board might end up a bit weak.
>
> I think we can assume it is doing that successfully, because otherwise
> we'd hear about it losing lots of games in ladders.
>

Unfortunately, we can't assume that based on that observation.

If you observe what is going on with both Leela Zero and ELF, and MiniGo
and SAI as well - all of which are reproductions of AlphaZero with
different hyperparameters and infrastructure that do not include a ladder
feature, I think you can find *all* of them have at least some trouble with
ladders. So this is empirical evidence that the vanilla AlphaZero algorithm
when applied to Go with a convolutional resnet, often has ladder problems.

And by seeing how these reproductions behave, it also becomes clear how
your observation can still be true at the same time.

Which is: with enough playouts, for all these bots MCTS is able to solve
ladders well enough at the root position and the upper levels of the tree
to avoid losing outright - usually a few tens of thousands of playouts are
plenty. So it just affects the strength by causing harm to the evaluation
quality deeper in the tree in ways that are harder to see. The kind of
thing that might cost you more like 20-50 Elo (pure guess, just my
intuition for the *very* rough order of magnitude with this much search on
top), rather than losing you every game.

The bigger problem happens when you try any of these bots on weaker
hardware that only gets few playouts - low-end GPUs, mobile hardware, etc.
for example *or the numbers of playouts that people often run CGOS bots
with*, namely 200 playouts, or 800 playouts, etc. You will find that they
are still clearly top-pro-level or superhuman at almost all aspects of the
game... except for ladders! And now at these low numbers of playouts, it
does include outright losing games due to ladders, or making major
misjudgments about a sequence that will depend on a ladder in 1-3 moves in
the future.

Sometimes, this even happens in the low-thousands of playouts. For example,
attached SGF shows such a case, where Leela Zero using almost the latest
40-block network (LZ285) with 2k playouts per move (plus tree reuse)
attempted to break a ladder, failed, and then played out the ladder anyways
and lost on the spot.

It is also true that neural nets *are* capable of learning judgments
related to ladders given the right data. Some time back, I found with some
visualizations for KataGo's net that it actually is tracing a width-6
diagonal band across the board from ladders! But the inductive bias is weak
enough, plus the structure of the game tree for ladders is hard (it's like
the classic "cliff walking" problem in RL turned up to the max), that it's
a chicken-and-egg problem. Starting from a net that doesn't understand
ladders yet, the "MCTS policy/value-improvement operator" is empirically
very poor at bootstrapping the net into understanding them.


> > something the first version of AlphaGo did (before they tried to make it
> > "zero") and something that many other bots do as well. But Leela Zero and
> > ELF do not do this, because of attempting to remain "zero", ...
>
> I know that zero-ness was very important to DeepMind, but I thought the
> open source dedicated go bots that have copied it did so because AlphaGo
> Zero was stronger than AlphaGo Master after 21-40 days of training.
> I.e. in the rarefied atmosphere of super-human play that starter package
> of human expert knowledge was considered a weight around its neck.
>

The PR and public press around AlphaZero may give one this impression
generally - it certainly sounds like a more impressive discovery if not
only can you learn from Zero, but doing so is actually better! But I'm
confident that this is not true in general, and that it also depends on
what "expert knowledge" you add, and how you add it.

You may note that the AlphaGo Zero paper makes no mention of how long or
with how many TPUs AlphaGo Master was trained (or if it does, I can't find
it) - so it's hard to say what Master vs Zero shows. Also, it claims that
AlphaGo Master still made use of handcrafted Monte-Carlo rollouts, which I
can easily believe that jettisoning could lead to a big improvement. And
it's at least plausible to me that not-pretraining on human pro games might
give better final results (*but* this is unclear - at least I don't know of
any paper that actually runs this as a controlled test)..

But there are other bits of "expert knowledge" that do provide an
improvement over being pure-zero if done correctly, including:
* Predicting the final ownership of the board, not just the win/loss.
* Adding a small/mild term for caring about score, rather than just
win/loss.
* Seeding a percentage of 

Re: [Computer-go] CGOS source on github

2021-01-22 Thread David Wu
@Claude - Oh, sorry, I misread your message, you were also asking about
ladders, not just liberties. In that case, yes! If you outright tell the
neural net as an input whether each ladder works or not (doing a short
tactical search to determine this), or something equivalent to it, then the
net will definitely make use of that information, There are some bad side
effects even to doing this, but it helps the most common case. This is
something the first version of AlphaGo did (before they tried to make it
"zero") and something that many other bots do as well. But Leela Zero and
ELF do not do this, because of attempting to remain "zero", i.e. free as
much as possible from expert human knowledge or specialized feature
crafting.


On Fri, Jan 22, 2021 at 9:26 AM David Wu  wrote:

> Hi Claude - no, generally feeding liberty counts to neural networks
> doesn't help as much as one would hope with ladders and sekis and large
> capturing races.
>
> The thing that is hard about ladders has nothing to do with liberties - a
> trained net is perfectly capable of recognizing the atari, this is
> extremely easy. The hard part is predicting if the ladder will work without
> playing it out, because whether it works depends extremely sensitively on
> the exact position of stones all the way on the other side of the board. A
> net that fails to predict this well might prematurely reject a working
> ladder (which is very hard for the search to correct), or be highly
> overoptimistic about a nonworking ladder (which takes the search thousands
> of playouts to correct in every single branch of the tree that it happens
> in).
>
> For large sekis and capturing races, liberties usually don't help as much
> as you would think. This is because approach liberties, ko liberties, big
> eye liberties, shared liberties versus unshared liberties, throwin
> possibilities all affect the "effective" liberty count significantly. Also
> very commonly you have bamboo joints, simple diagonal or hanging
> connections and other shapes where the whole group is not physically
> connected, also making the raw liberty count not so useful. The neural net
> still ultimately has to scan over the entire group anyways, computing these
> things.
>
> On Fri, Jan 22, 2021 at 8:31 AM Claude Brisson via Computer-go <
> computer-go@computer-go.org> wrote:
>
>> Hi. Maybe it's a newbie question, but since the ladders are part of the
>> well defined topology of the goban (as well as the number of current
>> liberties of each chain of stone), can't feeding those values to the
>> networks (from the very start of the self teaching course) help with large
>> shichos and sekis?
>>
>> Regards,
>>
>>   Claude
>> On 21-01-22 13 h 59, Rémi Coulom wrote:
>>
>> Hi David,
>>
>> You are right that non-determinism and bot blind spots are a source of
>> problems with Elo ratings. I add randomness to the openings, but it is
>> still difficult to avoid repeating some patterns. I have just noticed that
>> the two wins of CrazyStone-81-15po against LZ_286_e6e2_p400 were caused by
>> very similar ladders in the opening:
>> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/73.sgf
>> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/733301.sgf
>> Such a huge blind spot in such a strong engine is likely to cause rating
>> compression.
>>
>> Rémi
>>
>> ___
>> Computer-go mailing 
>> listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] CGOS source on github

2021-01-22 Thread David Wu
Hi Claude - no, generally feeding liberty counts to neural networks doesn't
help as much as one would hope with ladders and sekis and large capturing
races.

The thing that is hard about ladders has nothing to do with liberties - a
trained net is perfectly capable of recognizing the atari, this is
extremely easy. The hard part is predicting if the ladder will work without
playing it out, because whether it works depends extremely sensitively on
the exact position of stones all the way on the other side of the board. A
net that fails to predict this well might prematurely reject a working
ladder (which is very hard for the search to correct), or be highly
overoptimistic about a nonworking ladder (which takes the search thousands
of playouts to correct in every single branch of the tree that it happens
in).

For large sekis and capturing races, liberties usually don't help as much
as you would think. This is because approach liberties, ko liberties, big
eye liberties, shared liberties versus unshared liberties, throwin
possibilities all affect the "effective" liberty count significantly. Also
very commonly you have bamboo joints, simple diagonal or hanging
connections and other shapes where the whole group is not physically
connected, also making the raw liberty count not so useful. The neural net
still ultimately has to scan over the entire group anyways, computing these
things.

On Fri, Jan 22, 2021 at 8:31 AM Claude Brisson via Computer-go <
computer-go@computer-go.org> wrote:

> Hi. Maybe it's a newbie question, but since the ladders are part of the
> well defined topology of the goban (as well as the number of current
> liberties of each chain of stone), can't feeding those values to the
> networks (from the very start of the self teaching course) help with large
> shichos and sekis?
>
> Regards,
>
>   Claude
> On 21-01-22 13 h 59, Rémi Coulom wrote:
>
> Hi David,
>
> You are right that non-determinism and bot blind spots are a source of
> problems with Elo ratings. I add randomness to the openings, but it is
> still difficult to avoid repeating some patterns. I have just noticed that
> the two wins of CrazyStone-81-15po against LZ_286_e6e2_p400 were caused by
> very similar ladders in the opening:
> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/73.sgf
> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/733301.sgf
> Such a huge blind spot in such a strong engine is likely to cause rating
> compression.
>
> Rémi
>
> ___
> Computer-go mailing 
> listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] CGOS source on github

2021-01-22 Thread David Wu
On Fri, Jan 22, 2021 at 3:45 AM Hiroshi Yamashita  wrote:

> This kind of joseki is not good for Zero type. Ladder and capturing
>   race are intricately combined. In AlphaGo(both version of AlphaGoZero
>   and Master) published self-matches, this joseki is rare.
> -
>
> I found this joseki in kata1_b40s575v100 (black) vs LZ_286_e6e2_p400
> (white).
> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/22/733340.sgf
>

Hi Hiroshi - yep. This is indeed a joseki that was partly popularized by AI
and jointly explored with humans. It is probably fair to say that it is by
far the most complicated common joseki known right now, and more
complicated than either of the avalanche or the taisha.

Some zero-trained bots will find and enter into this joseki, some won't.
The ones that don't play this joseki in self-play will have a significant
chance to be vulnerable to it if an opponent plays it against them, because
there are a large number of traps and blind spots that cannot be solved if
the net doesn't have experience with the position. And even having some
experience is not always enough. For example, ELF and Leela Zero have
learned some lines, but are far from perfect. There is a good chance that
AlphaGoZero or Master would have been vulnerable to it as well. KataGo at
the time of 1.3.5 was also vulnerable to it too - it only rarely came up in
self-play, and therefore was never learned and correctly evaluated, so from
the 3-3 invader's side the joseki could be forced and KataGo would likely
mess up the joseki and be losing the game right at the start. (The most
recent KataGo nets are much less vulnerable now though).

The example you found is one where this has happened to Leela Zero. In the
game you linked, move 34 is a big mistake. Leela Zero underweights the
possibility of move 35, and then is blind to the seeming-bad-shape move of
37, and as a result, is in a bad position now. The current Leela Zero nets
consistently makes this mistake, *and* consistently prefer playing down
this line, so against an opponent happy to play it with them, Leela Zero
will lose many games right in the opening all the same way.

Anyways, the reason this joseki is responsible for more such distortions
than other joseki seems to be because it is so sharp, and unlike most other
common joseki, contains at least 5-6 enormous blind spots in different
variations that zero-trained nets variously have trouble to learn on their
own.

> a very large sampling of positions from a wide range
> > of human professional games, from say, move 20, and have bots play
> starting
> > from these sampled positions, in pairs once with each color.
>
> This sounds interesting.
> I will think about another CGOS that handle this.


I'm glad you're interested. I don't know if move 20 is a good number (I
just threw it out there), maybe it should be varied, it might take
some experimentation. And I'm not sure it's worth doing, since it's still
probably only the smaller part of the problem in general - as Remi pointed
out, likely ladder handling will be a thing that always continues to
introduce Elo-nontransitivity, and probably all of this is less important
than generally having a variety of long-running bots to help stabilize the
system over time.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] CGOS source on github

2021-01-22 Thread David Wu
On Fri, Jan 22, 2021 at 8:08 AM Rémi Coulom  wrote:

> You are right that non-determinism and bot blind spots are a source of
> problems with Elo ratings. I add randomness to the openings, but it is
> still difficult to avoid repeating some patterns. I have just noticed that
> the two wins of CrazyStone-81-15po against LZ_286_e6e2_p400 were caused by
> very similar ladders in the opening:
> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/73.sgf
> http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2021/01/21/733301.sgf
> Such a huge blind spot in such a strong engine is likely to cause rating
> compression.
> Rémi
>

I agree, ladders are definitely the other most noticeable way that Elo
model assumptions may be broken, since pure-zero bots have a hard time with
them, and can easily cause difference(A,B) + difference(B,C) to be very
inconsistent with difference(A,C). If some of A,B,C always handle ladders
very well and some are blind to them, then you are right that probably no
amount of opening randomization can smooth it out.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] CGOS source on github

2021-01-21 Thread David Wu
One tricky thing is that there are some major nonlinearities between
different bots early in the opening that break Elo model assumptions quite
blatantly at these higher levels.

The most noticeable case of this is with Mi Yuting's flying dagger joseki.
I've noticed for example that in particular matchups between different
pairs of bots (e.g. one particular KataGo net as white versus ELF as black,
or one version of LZ as black versus some other version as white), maybe as
many as 30% of games will enter into this joseki and the preferences for
the bots may happen by chance to line up such that consistently they will
play down a path where one side hits a blind spot and begins the game with
an early disadvantage. Each different bot may have different preferences
such that arbitrarily each possible pairing randomly runs into such a trap
or not.

And, having significant early-game temperature in the bot itself doesn't
always help as much as you would think because this particular joseki is so
sharp that a particular bot could easily have such a strong preference for
one path or another (even when it is ultimately wrong) so as to override
any reasonable temperature. Sometimes, adding temperature or extra
randomness simply only mildly changes the frequency of the sequence, or
just varies the time before the joseki and trap/blunder happens anyways.

If games are to begin from the empty board, I'm not sure there's an easy
way around this except having a very large variety of opponents.

One thing that I'm pretty sure would mostly "fix" the problem (in the sense
of producing a smoother metric of general strength in a variety of
positions not heavily affected by just a few key lines) would be to
semi-arbitrarily take a very large sampling of positions from a wide range
of human professional games, from say, move 20, and have bots play starting
from these sampled positions, in pairs once with each color. This would
still include many AI openings, because of the way human pros in the last
3-4 years have quickly integrated and experimented with them, but would
also introduce a lot more variety in general than would occur in any
head-to-head matchup.

This is almost surely a *smaller *problem than simply having enough games
mixing between different long-running bots to anchor the Elo system. And it
is not the only way major nontransitivities can show up, (e.g. ladders).
But to take a leaf from computer Chess, playing from sampled forced
openings seems to be a common practice there and maybe it's worth
considering in computer Go as well, even if it only fixes what is currently
the smaller of the issues.


On Thu, Jan 21, 2021 at 12:01 PM Rémi Coulom  wrote:

> Thanks for computing the new rating list.
>
> I feel it did not fix anything. The old Zen, cronus, etc.have almost no
> change at all.
>
> So it is not a good fix, in my opinion. No need to change anything to the
> official ratings.
>
> The fundamental problem seems that the Elo rating model is too wrong for
> this data, and there is no easy fix for that.
>
> Long ago, I had thought about using a more complex multi-dimensional Elo
> model. The CGOS data may be a good opportunity to try it. I will try when I
> have some free time.
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread David Wu
I imagine that at low visits at least, "ACT" behaves similarly to Leela
Zero's "LCB" move selection, which also has the effect of sometimes
selecting a move that is not the max-visits move, if its value estimate has
recently been found to be sufficiently larger to balance the fact that it
is lower prior and lower visits (at least, typically, this is why the move
wouldn't have been the max visits move in the first place). It also scales
in an interesting way with empirical observed playout-by-playout variance
of moves, but I think by far the important part is that it can use
sufficiently confident high value to override max-visits.

The gain from "LCB" in match play I recall is on the very very rough order
of 100 Elo, although it could be less or more depending on match conditions
and what neural net is used and other things. So for LZ at least,
"ACT"-like behavior at low visits is not new.


On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki 
wrote:

> Hi,
>
> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
> https://github.com/zakki/leela-zero/commits/regularized_policy
>
> 2020年7月17日(金) 2:47 Rémi Coulom :
> >
> > This looks very interesting.
> >
> > From a quick glance, it seems the improvement is mainly when the number
> of playouts is small. Also they don't test on the game of Go. Has anybody
> tried it?
> >
> > I will take a deeper look later.
> >
> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
> >>
> >>
> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
> >>
> >>
> >> --
> >> Honesty is a very expensive gift. So, don't expect it from cheap people
> - Warren Buffett
> >> http://tayek.com/
> >>
> >> ___
> >> Computer-go mailing list
> >> Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> --
> Kensuke Matsuzaki
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


[Computer-go] katago 1.4.2 (top open-source 9x9 and 19x19 bot?)

2020-05-16 Thread David Wu
@Hiroshi Yamashita - about rn - cool! Really neat to see people picking up
some of KataGo's methods and running them. Hopefully people will find ways
to improve them further.

--

Also, in case people weren't aware since I hadn't advertised it on this
list, KataGo has also been available for a while too! And has recently just
had its 1.4.2 release!
https://github.com/lightvector/KataGo/releases

At this moment, KataGo may be the overall-strongest open source bot on all
of 9x9, 13x13, and 19x19. Or I hope if not literally the strongest, then
very closely in the running.

* For 9x9, the version that topped
http://www.yss-aya.com/cgos/9x9/standings.html not too long ago is simply
just KataGo's v1.3.5 release (now superseded by v1.4.2) and using the 40
block neural net from here:
https://github.com/lightvector/KataGo/releases/tag/v1.4.0.

* For 19x19, the same neural net plays at a very high level too, generally
stronger than Leela Zero (except for mi yuting's flying dagger). It varies
by hardware and different time settings and configurations, but somewhere
from 100-250 Elo would probably be typical to observe in a given test[1].

* KataGo also plays all intermediate sizes - 13, 15, even stuff like 12 -
at just as high of a level. Or it should, since it trains on them all the
same way. But it doesn't seem like there's notable competition going on on
those sizes, even on 13x13.

[1]Funnily enough, the best 19x19 tests have been with KataGo's 20 block
network, *not* the 40 block network - the 40 block network is stronger
per-playout, but the 20 block (which has been learning from the 40 block's
games) is faster by enough to more than make up for it. Although actually
the 40 block network has recently made large gains and might have caught up
at time parity on 19x19 finally. There hasn't been enough testing to know
for sure yet.

On Sat, May 16, 2020 at 9:43 AM Hiroshi Yamashita  wrote:

> Hi,
>
> Kensuke Matsuzaki released rn-6.3.0.
> It is one of the strongest 9x9 engine.
> It is "LeelaZero + 9x9 + heuristic features + adjustable komi + KataGo
> like learning".
>
> rn-6.3.0
> https://github.com/zakki/leela-zero/releases/tag/rn-6.3.0
> Rn.6.3
> https://twitter.com/k_matsuzaki/status/1260908554359173120
>
> Author says v995 is a latest model, but v945 is stronger.
> And v995 tends to play (4,4) on initial position.
>
> Thanks,
> Hiroshi Yamashita
>
>
> This is quote from README.md in zip file.
> ---
> # '9x9-endstate' branch
>
> * For 9x9 game.
> * Ladder detection (by https://github.com/yssaya/leela-zero-ladder)
> * Various komi (by https://github.com/ihavnoid/leela-zero)
> * Additional input features.
>
> ---
>
> # 'Endstate' branch
>
> This is a fork of Leela Zero with the 'endstate' head.  The 'stock' Leela
> Zero uses the value and policy nets, while this also
> predicts how the game ends.  To do so, there are some changes:
>
> * Additional 'endstate' head : The 'endstate' is how the game ended - that
> information is also stored on the training data.
> * Acceleration mode : To predict the endstate, we can't just resign when
> we find the game hopeless - we have to play it to the end.  Hence,
>once we hit the resignation threshold, we reduce the playouts to 1
> instead of resigning.
> * Using the 'endstate' information as the auxillary policy - see
> Network.cpp for details on how it uses the auxilary policy
>
> The main goal of this branch is to play reasonable handicap games (and to
> some extent, play games with komi)
> ---
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] 30% faster with a batch size of 63 instead of 64!

2020-05-09 Thread David Wu
Very nice. :)
And thanks for the note about batch sizing. Specifically tuning parameters
for this level of strength on 9x9 seems like it could be quite valuable,
Kata definitely hasn't done that either.

But it feels like bots are very very close to optimal on 9x9. With some
dedicated work, more months or years of training, it might be possible to
reach unbeatable, for all practical purposes, and as you mentioned in the
other thread, adaptively building out an opening book could be a part of
getting.there - I'd love to see an "unbeatable 9x9 crazystone" a year down
the line.

One fundamental issue that I've been noticing in a variety of domains is
precisely that self-play under AlphaZero and generally reinforcement
learning in environments like these doesn't explore enough, and it's very,
very difficult to get it to do so in a way that's still robust and
efficient enough. And unless you plan to do something like AlphaStar's
internal self-play training league, which would seem to nontrivally
multiply the cost, it seems like playing other opponents instead of just
selfplay can't entirely be the solution... because once you reach enough
better than the best other opponent, it's hard to usefully continue doing
that. And the league *still* didn't entirely fix the problem for AlphaStar,
in that humans were still able to sometimes find exploitative strategies
that it hadn't learned any idea of how to handle via selfplay, and reacted
very poorly to. It feels like there's something unsolved and "missing" from
current algorithms.


On Sat, May 9, 2020 at 7:17 PM Rémi Coulom  wrote:

> Yeh! first win against Kata!
> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/09/999849.sgf
>
> In addition to the optimized batch size, I did two other things:
>  - I use two batches of 63 instead of one, with double buffering, so that
> the GPU is kept 100% busy. About 14k nodes per second now.
>  - I make the search less selective, by using a bigger exploration
> constant in the MCTS formula.
> I should download Katago and CLOP my search parameters against it.
>
> So far I have tried to keep the "Zero" philosophy of using self-play only,
> but playing against other opponents is very likely to be a better approach
> at making progress.
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-08 Thread David Wu
On Fri, May 8, 2020 at 8:46 PM uurtamo .  wrote:

> And this has no book, right? So it should be badly abused by a very good
> book?
>
>
Maybe!

But the version that was running before which went something like 48-52-1
(last time I counted it up) against the other top 3 bots that were rated
3300+ also had no book. Granted, the newer one is coded to play a
little more excitingly on average, probably so its opening lines might have
more flaws. At least, that's what the net should have been trained to do.
(katab40s37-pda1)
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-08 Thread David Wu
kata-bot on OGS is intended for human players on OGS and is never
guaranteed to be any particular version (certainly not an up-to-date
version) nor have any specific fixed settings. You should generally not use
it for testing - just download KataGo and run it yourself (in Lizzie, or
Sabaki, or whatever) if you want reliable settings.

The "aggressive" version that I mentioned in my last email is running on
cgos, not on OGS.

On Fri, May 8, 2020 at 8:23 PM Kyle Biedermann 
wrote:

> Sounds fun and interesting experiment, I have noticed the increase in
> preference to the 4-4 as of lately across the majority of AI. I still
> prefer the 5-5 opening it seems to hold against Katago at the moment. Maybe
> i'll test out some things to see if it can find some interesting new moves.
> Is it still the Kata-bot account on OGS?.
>
> Kyle Biedermann
> Creator of Deep Scholar
>
> On Fri, May 8, 2020 at 3:28 PM David Wu  wrote:
>
>> I'm running a new account of KataGo that is set to bias towards
>> aggressive or difficult moves now (the same way it does in 19x19 handicap
>> games), to see what the effect is. Although, it seems like some people have
>> stopped running their bots.  Still maybe it will be interesting for the
>> remaining players, or any others who decide to re-turn-on their bot for a
>> little while. :)
>>
>> It seems like some fraction of the time, it now opens on 5-5 as black,
>> which is judged as worse than 4-4 in an even game, but presumably is more
>> difficult to play. I suspect it will now start to lose a noticeable number
>> of games now due to overplaying, and there's a good chance it does much
>> worse overall. Even so, I'm curious what will happen, and what the draw
>> rate will be. Suddenly having some 5-5 openings should certainly add some
>> variety to the games.
>>
>> On Thu, May 7, 2020 at 12:41 PM David Wu  wrote:
>>
>>> Having it matter which of the stones you capture there is fascinating.
>>> Thanks for the analysis - and thanks for "organizing" this 9x9 testing
>>> party. :)
>>>
>>> On Thu, May 7, 2020 at 12:06 PM Rémi Coulom 
>>> wrote:
>>>
>>>> If White recaptures the Ko, then Black can play at White's 56, capture
>>>> the stone, and win by 2 points.
>>>>
>>>> On Thu, May 7, 2020 at 5:02 PM Shawn Ligocki 
>>>> wrote:
>>>>
>>>>> Thanks for sharing the games, Rémi!
>>>>>
>>>>> On Thu, May 7, 2020 at 6:27 AM Rémi Coulom 
>>>>> wrote:
>>>>>
>>>>>> In this game, Crazy Stone won using a typical Monte Carlo trick:
>>>>>> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/07/997390.sgf
>>>>>> On move 27, it sacrificed a stone. According to Crazy Stone, the game
>>>>>> would have been a draw had Aya just re-captured it. But Aya took the bait
>>>>>> and captured the other stone. Crazy Stone's evaluation became instantly
>>>>>> winning after this, the sacrificed stone serving as a threat for the
>>>>>> winning ko fight, 18 moves later.
>>>>>>
>>>>>
>>>>> Wow, I did not imagine how that move would be useful later! But the
>>>>> very end is confusing to my human brain, couldn't White move 56 retake the
>>>>> ko and win it? It seems like Black only has one real ko threat left (J4
>>>>> maybe). But White also has one huge threat left (D3), so it seems like
>>>>> White should win this ko and then be about 4 ahead with komi. Am I
>>>>> missing something?
>>>>>
>>>>> -Shawn
>>>>> ___
>>>>> Computer-go mailing list
>>>>> Computer-go@computer-go.org
>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>>
>>>> ___
>>>> Computer-go mailing list
>>>> Computer-go@computer-go.org
>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>
>>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-08 Thread David Wu
I'm running a new account of KataGo that is set to bias towards aggressive
or difficult moves now (the same way it does in 19x19 handicap games), to
see what the effect is. Although, it seems like some people have stopped
running their bots.  Still maybe it will be interesting for the
remaining players, or any others who decide to re-turn-on their bot for a
little while. :)

It seems like some fraction of the time, it now opens on 5-5 as black,
which is judged as worse than 4-4 in an even game, but presumably is more
difficult to play. I suspect it will now start to lose a noticeable number
of games now due to overplaying, and there's a good chance it does much
worse overall. Even so, I'm curious what will happen, and what the draw
rate will be. Suddenly having some 5-5 openings should certainly add some
variety to the games.

On Thu, May 7, 2020 at 12:41 PM David Wu  wrote:

> Having it matter which of the stones you capture there is fascinating.
> Thanks for the analysis - and thanks for "organizing" this 9x9 testing
> party. :)
>
> On Thu, May 7, 2020 at 12:06 PM Rémi Coulom  wrote:
>
>> If White recaptures the Ko, then Black can play at White's 56, capture
>> the stone, and win by 2 points.
>>
>> On Thu, May 7, 2020 at 5:02 PM Shawn Ligocki  wrote:
>>
>>> Thanks for sharing the games, Rémi!
>>>
>>> On Thu, May 7, 2020 at 6:27 AM Rémi Coulom 
>>> wrote:
>>>
>>>> In this game, Crazy Stone won using a typical Monte Carlo trick:
>>>> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/07/997390.sgf
>>>> On move 27, it sacrificed a stone. According to Crazy Stone, the game
>>>> would have been a draw had Aya just re-captured it. But Aya took the bait
>>>> and captured the other stone. Crazy Stone's evaluation became instantly
>>>> winning after this, the sacrificed stone serving as a threat for the
>>>> winning ko fight, 18 moves later.
>>>>
>>>
>>> Wow, I did not imagine how that move would be useful later! But the very
>>> end is confusing to my human brain, couldn't White move 56 retake the ko
>>> and win it? It seems like Black only has one real ko threat left (J4
>>> maybe). But White also has one huge threat left (D3), so it seems like
>>> White should win this ko and then be about 4 ahead with komi. Am I
>>> missing something?
>>>
>>> -Shawn
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-07 Thread David Wu
Having it matter which of the stones you capture there is fascinating.
Thanks for the analysis - and thanks for "organizing" this 9x9 testing
party. :)

On Thu, May 7, 2020 at 12:06 PM Rémi Coulom  wrote:

> If White recaptures the Ko, then Black can play at White's 56, capture the
> stone, and win by 2 points.
>
> On Thu, May 7, 2020 at 5:02 PM Shawn Ligocki  wrote:
>
>> Thanks for sharing the games, Rémi!
>>
>> On Thu, May 7, 2020 at 6:27 AM Rémi Coulom  wrote:
>>
>>> In this game, Crazy Stone won using a typical Monte Carlo trick:
>>> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/07/997390.sgf
>>> On move 27, it sacrificed a stone. According to Crazy Stone, the game
>>> would have been a draw had Aya just re-captured it. But Aya took the bait
>>> and captured the other stone. Crazy Stone's evaluation became instantly
>>> winning after this, the sacrificed stone serving as a threat for the
>>> winning ko fight, 18 moves later.
>>>
>>
>> Wow, I did not imagine how that move would be useful later! But the very
>> end is confusing to my human brain, couldn't White move 56 retake the ko
>> and win it? It seems like Black only has one real ko threat left (J4
>> maybe). But White also has one huge threat left (D3), so it seems like
>> White should win this ko and then be about 4 ahead with komi. Am I
>> missing something?
>>
>> -Shawn
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-07 Thread David Wu
Yes, it's fun to see suddenly a little cluster of people running strong 9x9
bots. :)

katab40s37-awsp3 is running on one AWS p3 2xlarge instance. So it's a
single V100, with some settings tuned appropriately for good performance on
that hardware and board size and time control (mainly, 96 threads), and
also some of the score maximization config settings toned down so as to
focus more on win/loss. It's running one of the most recent 40 block KataGo
nets from the current ongoing training run, a net that isn't released yet
but which I'll release soon. But it's only more recent by a couple of
weeks, the latest released ones should probably be about the same strength
too.

I think it gets somewhere from 5000 to 7000 playouts per second, I forget
what the exact numbers were. No opening book, just searching from scratch.
A book seems like it would enable a large increase in the effective number
of playouts early on (and save time for deeper search out-of-book too), but
I haven't worked on book code.

The current run, which has been ongoing for a few months on variously from
37 to 47 V100s spends about 4% of its games on 9x9, which probably amounts
to 1%-2% of the total compute since 9x9 games are much shorter than games
on larger boards. So presumably a lot of the strength on 9x9 is due to the
neural net generalizing its knowledge from other board sizes, since the
same net trains on all sizes at the same time.

There's a setting in KataGo ("PDA") that causes it to play a bit more
aggressively, in whatever way it has learned from self-play that increases
the likelihood of weaker players making a mistake. In 19x19 handicap games,
it gives a strong boost in how many handicap stones it's capable of
offering to weaker opponents, even helping against players near human pro
level. I haven't turned it on here, but maybe I could make a second
username with this set to a nonzero value. I'd be curious if it would make
KataGo avoid some of the more easily drawn lines of play in favor of ones
that would be more challenging for most opponents.


On Thu, May 7, 2020 at 6:27 AM Rémi Coulom  wrote:

> Hi,
>
> Thanks to all the strong bots who joined. Kata is impressive. Does anyone
> know more about its configuration? Is it a single V100 or many?
>
> I watched some games, and some where spectacular.
>
> A firework of ko fights:
> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/07/997314.sgf
>
> In this game, Crazy Stone won using a typical Monte Carlo trick:
> http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2020/05/07/997390.sgf
> On move 27, it sacrificed a stone. According to Crazy Stone, the game
> would have been a draw had Aya just re-captured it. But Aya took the bait
> and captured the other stone. Crazy Stone's evaluation became instantly
> winning after this, the sacrificed stone serving as a threat for the
> winning ko fight, 18 moves later.
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Polygames: Improved Zero Learning

2020-02-02 Thread David Wu
Yep, global pooling is also how KataGo already handles multiple board sizes
with a single model. Convolution weights don't care about board size at all
since the filter dimensions have no dependence on it, so the only issue is
if you're using things like fully-connected layers, and those are often
easily replaceable with global pooling and convolution, making the whole
net board-size-independent. (Although you still need to train/finetune
appropriately).

Nice to see the same idea finally making its way around in papers too.
Along with all the experiments that projects like LC0 and such are trying
in different runs, it feels like the state of formal published research in
this area is often one step behind what major projects are already
successfully doing.

On Sun, Feb 2, 2020 at 8:06 AM Rémi Coulom  wrote:

> Hi,
>
> I have just noticed this has recently been released:
>
> github:
> https://github.com/facebookincubator/polygames
> paper:
> https://arxiv.org/abs/2001.09832
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Hyper-Parameter Sweep on AlphaZero General

2019-03-24 Thread David Wu
Thanks for sharing the link. Taking a brief look at this paper, I'm quite
confused about their methodology and their interpretation of their data.

For example in figure 2 (b), if I understand correctly, they plot Elo
ratings for three independent runs where they run the entire AlphaZero
process for 50,100,150 iterations, where they seem to be using the word
"iteration" to mean an entire block of playing a fixed number of games
("episodes"), training the neural net, and then testing the net to see if
it should replace the previous.

Their plot of Elo ratings however shows the 50 iterations run starting much
higher and ending much lower than the 100, which starts much higher and
ends much lower than the 150. What stands out is that each of the three
independently seems to be mean 0. Does this mean that for every run, they
only computed Elos using games between nets within that run itself, with no
games comparing nets across separate runs? If so, this makes every Elo
graph in the paper tricky to interpret, since none of the values in any of
them are directly comparable between lines. The ones that span a wider
range are likely to be better runs (more Elo improvement within that run),
but since nontransitivity effects can sometimes lead to some dilations or
contractions of the apparent Elo gain versus the "true" gain against more
general opponents, without cross-run games it's hard to be entirely
confident about the comparisons.

They also seem to imply in the text that the bump in training loss near the
end of the 150 iteration run in Figure 2 (a) indicates that the neural net
worsened, and that more iterations may make the bot worse. This seems to me
a strange conclusion. Their own graph shows that actually the relative Elo
strength within that run almost monotonely increased through that whole
period. Since the AlphaZero process trains towards a moving target, it's
easy for the loss to increase simply due to the data getting harder even if
the neural net always improves - for example maybe the most common opening
changes from a simple one to one that leads to games that are complex and
harder to predict, even if the neural net improves its strength and
accuracy in both openings the whole time.


On Sun, Mar 24, 2019 at 10:05 AM Rémi Coulom  wrote:

> Hi,
>
> Here is a paper you might be interested in:
>
> Abstract:
>
> Since AlphaGo and AlphaGo Zero have achieved breakground successes in the
> game of Go, the programs have been generalized to solve other tasks.
> Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the
> literature, the algorithms are explained well. However, AlphaZero contains
> many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there
> is sufficient discussion about how to set parameter values in these
> algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero
> and evaluate how these parameters contribute to training. We focus on three
> objectives~(training loss, time cost and playing strength). For each
> parameter, we train 3 models using 3 different values~(minimum value,
> default value, maximum value). We use the game of play 6×6 Othello, on the
> AlphaZeroGeneral open source re-implementation of AlphaZero. Overall,
> experimental results show that different values can lead to different
> training results, proving the importance of such a parameter sweep. We
> categorize these 12 parameters into time-sensitive parameters and
> time-friendly parameters. Moreover, through multi-objective analysis, this
> paper provides an insightful basis for further hyper-parameter optimization.
>
> https://arxiv.org/abs/1903.08129
>
> Rémi
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Accelerating Self-Play Learning in Go

2019-03-08 Thread David Wu
On Fri, Mar 8, 2019 at 8:19 AM Darren Cook  wrote:

> > Blog post:
> > https://blog.janestreet.com/accelerating-self-play-learning-in-go/
> > Paper: https://arxiv.org/abs/1902.10565
>
> I read the paper, and really enjoyed it: lots of different ideas being
> tried. I was especially satisfied to see figure 12 and the big
> difference giving some go features made.
>
>
Thanks, glad you enjoyed the paper.


> Though it would be good to see figure 8 shown in terms of wall clock
> time, on equivalent hardware. How much extra computation do all the
> extra ideas add? (Maybe it is in the paper, and I missed it?)
>
>
I suspect Leela Zero would come off as far *less* favorable if one tried to
do such a comparison using their actual existing code rather than
abstracting down to counting neural net evals, because as far as I know in
Leela Zero there is no cross-game batching of neural net evaluations, which
makes a huge difference in the ability to use a strong GPU efficiently.
Only in the last couple months or so based on what I've been seeing in chat
and pull requests, Leela Zero implemented within-search batching of neural
net evals, but clients still only play one game at a time.

But maybe this is a distraction from your actual question, which is how
much do these extra things slow the process down in computational time
given both equivalent hardware *and* equivalently good architecture. Mostly
they almost don't have any cost, which is why the paper doesn't really
focus on that question. The thing to keep in mind is that almost all of the
cost is the GPU-bound evaluation of the convolutional layers in the neural
net during self-play.

* Ownership and score distribution are not used during selfplay (except
optionally ownership at the root node for a minor heuristic), so they don't
contribute to selfplay cost. Also even on the training side, they are only
a slight cost (at most a few percent), since this is just some computations
at the output head of the neural net needing vastly fewer floating point
ops than are in the convolutions in the main trunk of the net.

* Global pooling costs nothing, since it does not add new convolutions in
my implementation, instead only re-purposing existing channels. In my
implementation, it actually *reduces* the number of parameters in the model
and (I believe) the nominal number of floating point ops in the model,
since the re-purposing of some channels to be pooled reduces the number of
channels feeding into the convolution of the next layer. This is offset by
the cost of doing the pooling and the additional GPU calls, netting out to
about 0 cost.

* Multi-board-size masking cost is also very small if your GPU
implementation fuses the mask with adjacent batch-norm bias+scale
operations.

* Go-specific input features add about a 10% cost on my hardware when using
the 10b128c net, due to a combination of ladder search being not cheap, and
the extra IO that you have to do to the GPU to communicate the features,
and presumably closer to 5% for 15b192c and continuing to decrease if you
move to larger nets, as the CPU and IO cost become more and more irrelevant
as the proportion of GPU work grows.

* Playout/visit cap oscillation is just a change of some root-level search
parameters. Target pruning is just some cheap CPU postprocessing. The cost
of writing down the various additional targets somewhat expands the size of
the training data on disk, but is pretty cheap with a good SSD. I think
nothing else adds any cost.

> I found some other interesting results, too - for example contrary to
> > intuition built up from earlier-generation MCTS programs in Go,
> > putting significant weight on score maximization rather than only
> > win/loss seems to help.
>
> Score maximization in self-play means it is encouraged to play more
> aggressively/dangerously, by creating life/death problems on the board.
> A player of similar strength doesn't know how to exploit the weaknesses
> left behind. (One of the asymmetries of go?)
>

Note that testing games were between random pairs of players proportional
to p(1-p) from estimated Elos. Even for a 200 Elo difference, p = 0.76,
then p(1-p) = 0.18, which is not that much smaller than when p = 0.5 giving
p(1-p) = 0.25. So quite many testing games were between players of fairly
different strengths.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] 0.5-point wins in Go vs Extremely slow LeelaChessZero wins

2019-03-05 Thread David Wu
Yes, partly. For Go, putting some partial weight on score maximization
causes KataGo to continue to play strong moves and try to kill things or
seek profitable trades when it's already winning, subject to still being
fairly safe. Since most of the utility is still on win/loss rather than
score, the bot will still spend some moves making "unnecessary" defenses
and biasing towards "safe" moves. But humans of course also play moves that
sacrifice or lose points when ahead to secure safety.

This is a game from a earlier net, before I uploaded stronger ones, but
shows off some of this effect. Once getting ahead in the opening, it plays
a solid and secure style, but is still very willing to keep fighting and
killing more groups to increase the score difference.
https://online-go.com/game/16009792



On Tue, Mar 5, 2019 at 6:31 PM Shawn Ligocki  wrote:

> I wonder if this behavior could be avoided by giving a small incentive to
> win by the most points (or most material in chess) similar to to the
> technique mentioned by David Wu in KataGo a few days ago. The problem right
> now is that the AI has literally no reason to think that winning with more
> points is better than by 0.5 points, whereas human players prefer to win by
> more points slightly. David, have you noticed if KataGo avoids these sorts
> of losing point moves at the end of the game?
>
> (I feel the same reasoning applies to automatic cars, they could be (and
> probably are) trained to prefer smoother ride in addition to avoiding
> accidents.)
>
> On Tue, Mar 5, 2019 at 5:11 PM "Ingo Althöfer" <3-hirn-ver...@gmx.de>
> wrote:
>
>> Hi,
>>
>> recently, Leela-Chess-Zero has become very strong, playing
>> on the same level with Stockfish-10. Many of the test players
>> are puzzled, however, by the "phenomenon" that Lc0 tends to
>> need many many moves to transform an overwhelming advantage
>> into a mate.
>>
>> Just today a new German tester reported a case and described
>> it by the sentence "da wird der Hund in der Pfanne verrückt"
>> ("now the dog is going crazy in the pan", to translate it word
>> by word). He had seen an endgame: Stockfish with naked king,
>> and LeelaZero with king, queen and two rooks. Leela first
>> sacrificed the queen, then one of the rooks, and only then
>> started to go for a "normal" mate with the last remaining rook
>> (+ king). The guy (Florian Wieting) asked for an explanation.
>>
>> http://forum.computerschach.de/cgi-bin/mwf/topic_show.pl?tid=10262
>>
>> I think there is a very straightforward one: What Leela-Chess-Zero
>> with its MCTS-based searc) performs is comparable to the
>> path all MCTS Go bots took for many years when playing winning
>> positions against human opponents: the advantage was reduced
>> step by step, and in the end the bot gained a win by 0.5 points.
>> Later, in the tournament table, that was not a problem, because
>> a win is a win :-)
>>
>> Similarly in chess: overwhelming advantage is reduced by lazy play
>> to some small margin advantage (against a straightforward alpha-beta
>> opponent), and then the MCTS chess bot (= Leela Zero in this case)
>> starts playing concentratedly.
>>
>> Another guy asked how DeepMind had worked around this problem
>> with their AlphaZero. I am rather convinced: They also had this
>> problem. Likely, they kept the most serious examples undisclosed,
>> and furthermore set the margins for resignation rather narrow (for
>> instance something like evaluation +-6 by Stockfish for three move
>> pairs) to avoid nearly endless endgames.
>>
>> Ingo.
>>
>> PS: thinking of a future with automatic cars in public traffic. The
>> 0.5-point wins or the related behaviour in MCTS-based chess would mean
>> that an automatic car would brake only in the very last moment
>> knowing that it will be sufficient to stop 20 centimeters next to the
>> back-bumpers of the car ahead. Of course, a human passenger would
>> not like to experience such situations too often.
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Accelerating Self-Play Learning in Go

2019-03-03 Thread David Wu
For any interested people on this list who don't follow Leela Zero
discussion or reddit threads:

I recently released a paper on ways to improve the efficiency of
AlphaZero-like learning in Go. A variety of the ideas tried deviate a
little from "pure zero" (e.g. ladder detection, predicting board
ownership), but still only uses self-play starting from random and with no
outside human data.

Although longer training runs have NOT yet been tested, for reaching up to
about LZ130 strength so far (strong human pro or just beyond it, depending
on hardware), you can speed up the learning to that point by roughly a
factor of 5 at least compared to Leela Zero, and closer to a factor of 30
for merely reaching the earlier level of very strong amateur strength
rather than pro or superhuman.

I found some other interesting results, too - for example contrary to
intuition built up from earlier-generation MCTS programs in Go, putting
significant weight on score maximization rather than only win/loss seems to
help.

Blog post:
https://blog.janestreet.com/accelerating-self-play-learning-in-go/
Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] CGOS Real-time game viewer

2018-05-30 Thread David Wu
Awesome!

I was going to complain that the color scheme had not enough contrast
between the white stones and the background (at least on my desktop, it was
hard to see the white stones clearly), and then I discovered the settings
menu already lets you change the background color. Cool! :)


On Wed, May 30, 2018 at 8:38 AM, uurtamo  wrote:

> This is really well done.
>
> Thanks,
>
> Steve
>
>
> On Tue, May 29, 2018 at 4:10 PM, Hiroshi Yamashita 
> wrote:
>
>> Hi,
>>
>> CGOS 19x19 Real-time game viewer is available.
>> https://deepleela.com/cgos
>>
>> Thank you for author of DeepLeela site.
>> DeepLeela logins as a Viewer, and dispatches to thier clients.
>> There were Viewing Client on Linux and Windows, but no on the browser.
>>
>> Source code is also available.
>> https://github.com/deepleela
>>
>> Thanks,
>> Hiroshi Yamashita
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some personal thoughts on 9x9 computer go

2018-03-02 Thread David Wu
That was very interesting, thanks for sharing!

On Fri, Mar 2, 2018 at 9:50 AM,  wrote:

> Hi,
> somebody asked me to post my views on 9x9 go on the list based on my
> experience with correspondence go on OGS and little Golem.
>
> I have been playing online correspondence tournaments since 2011 with
> Valkyria which is a MCTS heavy playout MCTS using AMAF heavily tuned for
> 9x9. Also with the kind Support of Ingo, I used to generate a lot of 9x9
> data for opening book preparations many years ago.
>
> http://www.littlegolem.net/jsp/games/gamedetail.jsp?gtid=go9=ch
>
> During these years I collected a opening book based on running Valkyria
> with 2 thread something like 2 to 24 hours. I can do this because of a hash
> table that works well. Valkyria is tuned to be very selective, so it
> follows an iterative deepening algorithm where it searches for some time
> and the discard the tree, storing the best move. In each iteration it will
> start a new search with an empty tree, but will use the hash table to
> research known position more efficiently. This way it can overcome the
> problem that a MCTS fills memory very quickly.
>
> Valkyria has no stopping rule, so in the end it is mostly a hybrid
> human/computer decision when to stop search. But most of the time I just
> wait until it seems to converge on a single move with a clear winrate
> advantage. For the openings I tend to do choose moves many times, mostly to
> avoid lines where it has lost in the past, but I only choose moves it has
> been investigated during iterative search.
>
> If I run Valkyria on 9x9 CGOS (2295 Bayes Elo) it is not very strong, but
> against amateur humans on OGS and LG it has been very successful but not
> unbeatable.
>
> On LG there is a player Gerhard Knop (who I think uses one or more
> programs as support (or at least used to do I just read this indirectly
> somewhere) which seems to be clearly strong right know. At least recently
> he seems to be very good with white against Valkyria.
>
> So what have I found out about 9x9?
>
> I used to think that with the Opening book of Valkyria black is an easy
> win with a komi less than 7.0. Since Gerhard Knop has been beating Valkyria
> with white I changed into thinking black should be an easy win... but my
> opening book is not very close to optimal play, it is just the playing
> style of Valkyria.
>
> Other human players are playing very well too and it often happens that
> Valkyria wins games which was evaluated as a loss despite the enormous
> (well for a single PC guy, I am not Deepmind) computations behind all
> moves. The strongest humans repeatedly play moves that Valkyria never read
> deeply, even after 12 hours of computations, which turn out to be as strong
> or better than the expected best move.
>
> So is 9x9 easier than 19x19? Yes of course... but it is not that easy. In
> go there is the complexity of the number of legal moves but this is no
> longer the big problem. Most moves can be searched safely very shallow or
> not at all. In a well played 9x9 games it is a simultaneous problem of:
> endgame, life and death, semeai with ko fights, subtle differences in move
> ordering for forced moves etc. This give fighting lines that cannot be
> reliably evaluated by MCTS until read 40-70 ply deep because all stones are
> unstable.
>
> I have not yet trained a value network for 9x9 but I can imagine that it
> might still be very hard to get close to perfect evaluation, so any engine
> would still need to search very deep to play close to perfection.
>
> From the current surge of strong engines on CGOS 9x9 I just learned that
> my engines is even further from perfect play that I previously thought
> since there are no sign of these engines being near perfect play given
> win/loss/jigo statistics.
>
>
> TL;DR:
>  I did 9x9 computer go for many years and I think 9x9 go is much harder
> than I originally thought. I am not ruling out a super strong 9x9 go
> program appearing next weak. I am just saying that close to perfect is much
> stronger than that I have seen so far.
>
>
> Best
> Magnus Persson
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Crazy Stone is back

2018-02-28 Thread David Wu
It's not even just liberties and semeai, it's also eyes. Consider for
example a large dragon that has miai for 2 eyes in distant locations, and
the opponent then takes one of them - you'd like the policy net to now
suggest the other eye-making move far away. And you'd also like the value
net to distinguish the three situations where the whole group has 2 eyes
even when they are distant versus the ones where it doesn't.

I've been doing experiments with somewhat smaller neural nets (roughly 4-7
residual blocks = 8-14 layers), without sticking to an idealized "zero"
approach. I've only experimented with policy nets so far, but presumably
much of this should also transfer to a value net's understanding too.

1. One thing I tried was chain pooling, which was neat, but ultimately
didn't seem promising:
https://github.com/lightvector/GoNN#chain-pooling
It solves all of these problems when the strings are solidly connected. It
helps also when the strings are long but not quite solidly connected too,
the information still propagates faster than without it. But of course, if
there are lots of little strings forming a group, diagonal connections,
bamboo joints, etc, then of course it won't help. And also chain pooling is
computationally costly, at least in Tensorflow, and it might have negative
effects on the rest of the neural net that I don't understand.

2. A new thing I've been trying recently that actually does seem moderately
promising is dilated convolutions, although I'm still early in testing.
They also help increase the speed of information propagation, and don't
require solidly connected strings, and also are reasonably cheap.

In particular: my residual blocks have 192 channels, so I tried taking
several of the later residual blocks in the neural net and making 64 of the
channels of the first convolution in each block use dilated convolutions
(leaving 128 channels of regular convolutions), with dilation factors of 2
or 3. Intuitively, the idea is that earlier blocks could learn to compute
2x2 or 3x3 connectivity patterns, and then the dilated convolutions in
later residual blocks will be able to use that to propagate information
several spaces at a time across connected groups or dragons.

So far, indications are that this works. When I looked at it in various
board positions, it helped in a variety of capturing race and
large-dragon-two-eye-miai situations, correctly suggesting moves that the
net without dilated convolutions would fail to find due to the move being
too far away. Also dilated convolutions seem pretty cheap - it only
slightly increases the computational cost of the net.

So far, I've found that it doesn't significantly improve the overall loss
function, presumably because now there are 128 channels instead of 192
channels of ordinary convolutions, so in return for being better at
long-distance interactions, the neural net has gotten worse at some local
tactics. But it also hasn't gotten worse the way it would if I simply
dropped the number of channels from 192 to 128 without adding any new
channels, so the dilated convolutions are being "used" for real work.

I'd be curious to hear if anyone else has tried dilated convolutions and
what results they got. If there's anything at all to do other than just add
more layers, I think they're the most promising thing I know of.


On Wed, Feb 28, 2018 at 12:34 PM, Rémi Coulom  wrote:

> 192 and 256 are the numbers of channels. They are fully connected, so the
> number of 3x3 filters is 192^2, and 256^2.
>
> Having liberty counts and string size as input helps, but it solves only a
> small part of the problem. You can't read a semeai from just the
> liberty-count information.
>
> I tried to be clever and find ways to propagate information along strings
> in the network. But all the techniques I tried make the network much
> slower. Adding more layers is simple and works.
>
> Rémi
>
> - Mail original -
> De: "Darren Cook" 
> À: computer-go@computer-go.org
> Envoyé: Mercredi 28 Février 2018 16:43:10
> Objet: Re: [Computer-go] Crazy Stone is back
>
> > Weights_31_3200 is 20 layers of 192, 3200 board evaluations per move
> > (no random playout). But it still has difficulties with very long
> > strings. My next network will be 40 layers of 256, like Master.
>
> "long strings" here means solidly connected stones?
>
> The 192 vs. 256 is the number of 3x3 convolution filters?
>
> Has anyone been doing experiments with, say, 5x5 filters (and fewer
> layers), and/or putting more raw information in (e.g. liberty counts -
> which makes the long string problem go away, if I've understood
> correctly what that is)?
>
> Darren
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> 

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Ah, right, the cases where you and your opponent's interests are not
perfectly anti-aligned make things a bit trickier, possibly introducing
some game theory into the mix. Then I don't know. :)

My first instinct is to say that in principle you should provide the neural
net both "must-win" parameters, and have the neural net produce two value
outputs, namely the expected utilities for each side separately (which
might not sum to 0), which the MCTS would accumulate separately, and at
each node depending on who is to move it would use the appropriate side's
statistics to choose the child to simulate. That's quite a big difference
though, and I haven't thought about ways that this could go wrong, it seems
like there might easily be some big pitfalls here.

The case where you and your opponent's interests are exactly anti-aligned
should still be straightforward though. In that case the way I think of it
is that "Play chess where draws are worth half a win and half a loss" and
"Play chess where draws are losing for you and winning for your opponent"
are two entirely distinct zero-sum games that merely happen to share a lot
of rules and features. So of course you should train on both games and
distinguish the two so that the neural net always knows which one it's
playing, but you can still share the same neural net instead of having two
separate nets to take advantage of the fact that each one will regularize
the learning for the other.

Maybe you still do need to take a little care, for example in Chess if the
bot gets sufficiently strong then must-win as black might just always fail
to succeed and only produce uninformative samples of always failing,
harming the training. I'm optimistic, but ultimately, all this would still
need testing.

On Tue, Feb 13, 2018 at 12:11 PM, Dan Schmidt <d...@dfan.org> wrote:

> Do you intend to use the same draw values for both sides in the self-play
> games? They can be independent:
>  - in a 3/1/0 scenario, neither player is especially happy with a draw
> (and in fact would rather each throw a game to each other in a two-game
> match than make two draws, but that's a separate issue);
>  - in a match with one game left, both players agree that a draw and a
> Black win (say) are equivalent results;
>  - in a tournament, the must-win situations of both players could be
> independent.
>
> In real life you usually have a good sense of how your opponent's
> "must-win" parameter is set, but that doesn't really apply here.
>
>
> On Tue, Feb 13, 2018 at 10:58 AM, David Wu <lightvec...@gmail.com> wrote:
>
>> Actually this pretty much solves the whole issue right? Of course the
>> proof would be to actually test it out, but it seems to me a pretty
>> straightforward solution, not nontrivial at all.
>>
>>
>> On Feb 13, 2018 10:52 AM, "David Wu" <lightvec...@gmail.com> wrote:
>>
>> Seems to me like you could fix that in the policy too by providing an
>> input feature plane that indicates the value of a draw, whether 0 as
>> normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose,
>> etc.
>>
>> Then just play games with a variety of values for this parameter in your
>> self-play training pipeline so the policy net gets exposed to each kind of
>> game.
>>
>> On Feb 13, 2018 10:40 AM, "Dan Schmidt" <d...@dfan.org> wrote:
>>
>> The AlphaZero paper says that they just assign values 1, 0, and -1 to
>> wins, draws, and losses respectively. This is fine for maximizing your
>> expected value over an infinite number of games given the way that chess
>> tournaments (to pick the example that I'm familiar with) are typically
>> scored, where you get 1, 0.5, and 0 points respectively for wins, draws,
>> and losses.
>>
>> However 1) not all tournaments use this scoring system (3/1/0 is popular
>> these days, to discourage draws), and 2) this system doesn't account for
>> must-win situations where a draw is as bad as a loss (say you are 1 point
>> behind your opponent and it's the last game of a match). Ideally you'd keep
>> track of all three probabilities and use some linear meta-scoring function
>> on top of them. I don't think it's trivial to extend the AlphaZero
>> architecture to handle this, though. Maybe it is sufficient to train with
>> the standard meta-scoring (while keeping track of the separate W/D/L
>> probabilities) but then use the currently applicable meta-scoring while
>> playing. Your policy network won't quite match your current situation, but
>> at least your value network and search will.
>>
>> On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
>> wrote:
>>
>

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Actually this pretty much solves the whole issue right? Of course the proof
would be to actually test it out, but it seems to me a pretty
straightforward solution, not nontrivial at all.

On Feb 13, 2018 10:52 AM, "David Wu" <lightvec...@gmail.com> wrote:

Seems to me like you could fix that in the policy too by providing an input
feature plane that indicates the value of a draw, whether 0 as normal, or
-1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, etc.

Then just play games with a variety of values for this parameter in your
self-play training pipeline so the policy net gets exposed to each kind of
game.

On Feb 13, 2018 10:40 AM, "Dan Schmidt" <d...@dfan.org> wrote:

The AlphaZero paper says that they just assign values 1, 0, and -1 to wins,
draws, and losses respectively. This is fine for maximizing your expected
value over an infinite number of games given the way that chess tournaments
(to pick the example that I'm familiar with) are typically scored, where
you get 1, 0.5, and 0 points respectively for wins, draws, and losses.

However 1) not all tournaments use this scoring system (3/1/0 is popular
these days, to discourage draws), and 2) this system doesn't account for
must-win situations where a draw is as bad as a loss (say you are 1 point
behind your opponent and it's the last game of a match). Ideally you'd keep
track of all three probabilities and use some linear meta-scoring function
on top of them. I don't think it's trivial to extend the AlphaZero
architecture to handle this, though. Maybe it is sufficient to train with
the standard meta-scoring (while keeping track of the separate W/D/L
probabilities) but then use the currently applicable meta-scoring while
playing. Your policy network won't quite match your current situation, but
at least your value network and search will.

On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hello,
>
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
>
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] MCTS with win-draw-loss scores

2018-02-13 Thread David Wu
Seems to me like you could fix that in the policy too by providing an input
feature plane that indicates the value of a draw, whether 0 as normal, or
-1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, etc.

Then just play games with a variety of values for this parameter in your
self-play training pipeline so the policy net gets exposed to each kind of
game.

On Feb 13, 2018 10:40 AM, "Dan Schmidt"  wrote:

The AlphaZero paper says that they just assign values 1, 0, and -1 to wins,
draws, and losses respectively. This is fine for maximizing your expected
value over an infinite number of games given the way that chess tournaments
(to pick the example that I'm familiar with) are typically scored, where
you get 1, 0.5, and 0 points respectively for wins, draws, and losses.

However 1) not all tournaments use this scoring system (3/1/0 is popular
these days, to discourage draws), and 2) this system doesn't account for
must-win situations where a draw is as bad as a loss (say you are 1 point
behind your opponent and it's the last game of a match). Ideally you'd keep
track of all three probabilities and use some linear meta-scoring function
on top of them. I don't think it's trivial to extend the AlphaZero
architecture to handle this, though. Maybe it is sufficient to train with
the standard meta-scoring (while keeping track of the separate W/D/L
probabilities) but then use the currently applicable meta-scoring while
playing. Your policy network won't quite match your current situation, but
at least your value network and search will.

On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> Hello,
>
> what is known about proper MCTS procedures for games
> which do not only have wins and losses, but also draws
> (like chess, Shogi or Go with integral komi)?
>
> Should neural nets provide (win, draw, loss)-probabilities
> for positions in such games?
>
> Ingo.
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AGZ Policy Head

2017-12-29 Thread David Wu
As far as a purely convolutional approach, I think you *can* do better by
adding some global connectivity.

Generally speaking, there should be some value in global connectivity for
things like upweighting the probability of playing ko threats anywhere on
the board when there is an active ko anywhere else on the board. If you
made the whole neural net purely convolutional, then of course with enough
convolutional layers the neural net could still learn to distribute this
the "there is an important ko on the board" property everywhere, but it
would take more many layers.

I've actually experimented with this recently in training my own policy net
- for example one approach is to have an special residual block just before
the policy head:
* Compute a convolution (1x1 or 3x3) of the trunk with C channels for a
small C, result shape 19x19xC.
* Average-pool the results down to 1x1xC.
* Multiply by CxN matrix to turn that into 1x1xN where N is the number of
channels in the main trunk of the resnet, broadcast up to 19x19xN, and add
back into the main trunk (e.g. skip connection).
Apply your favorite activation function at appropriate points in the above.

There are other possible architectures for this block too, I actually did
something a bit more complicated but still pretty similar. Anyways, it
turns out that when I visualize the activations on example game situations,
I find the that the neural net actually does use one of the C channels for
"is there a ko fight" which makes it predict ko threats elsewhere on the
board! Some of the other average-pooled channels appear to be used for
things like detecting game phase (how full is the board?), and detecting
who is ahead (perhaps to decide when to play risky or safe - it's
interesting that the neural net has decided this is important given that
it's a pure policy net and is trained to predict only moves, not values).

Anyways, for AGZ's case, it seems weird to only have 2 filters feeding into
the fully connected, that seems like too few to encode much useful logic
like this. I'm also mystified at this architecture.


On Fri, Dec 29, 2017 at 7:50 AM, Rémi Coulom  wrote:

> I also wonder about this. A purely convolutional approach would save a lot
> of weights. The output for pass can be set to be a single bias parameter,
> connected to nothing. Setting pass to a constant might work, too. I don't
> understand the reason for such a complication.
>
> - Mail original -
> De: "Andy" 
> À: "computer-go" 
> Envoyé: Vendredi 29 Décembre 2017 06:47:06
> Objet: [Computer-go] AGZ Policy Head
>
>
>
> Is there some particular reason AGZ uses two 1x1 filters for the policy
> head instead of one?
>
>
> They could also have allowed more, but I guess that would be expensive? I
> calculate that the fully connected layer has 2*361*362 weights, where 2 is
> the number of filters.
>
>
> By comparison the value head has only a single 1x1 filter, but it goes to
> a hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1
> filters here? Maybe since the final output is only a single scalar it's not
> needed?
>
>
>
>
>
>
>
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mcts and tactics

2017-12-19 Thread David Wu
I wouldn't find it so surprising if eventually the 20 or 40 block networks
develop a set of convolutional channels that traces possible ladders
diagonally across the board. If it had enough examples of ladders of
different lengths, including selfplay games where game-critical ladders
"failed to be understood" by one side or the other and possibly even got
played out, it seems like the neural net would have a significant incentive
to learn them, step by step.

On Tue, Dec 19, 2017 at 7:57 PM, Andy  wrote:

> How do you interpret this quote from the AGZ paper?
> "Surprisingly, shicho (“ladder” capture sequences that may span the whole
> board) – one of the first elements of Go knowledge learned by humans – were
> only understood by AlphaGo Zero much later in training."
>
> To me "understood" means the neural network itself can read at least some
> simple whole board ladders, ladder breakers, and ladder makers. I would
> find it a large oversell if they just mean the MCTS search reads the ladder
> across the whole board.
>
>
>
> 2017-12-19 18:16 GMT-06:00 Stephan K :
>
>> 2017-12-20 0:26 UTC+01:00, Dan :
>> > Hello all,
>> >
>> > It is known that MCTS's week point is tactics. How is AlphaZero able to
>> > resolve Go tactics such as ladders efficiently? If I recall correctly
>> many
>> > people were asking the same question during the Lee Sedo match -- and it
>> > seemed it didn't have any problem with ladders and such.
>>
>> Note that the input to the neural networks in the version that played
>> against Lee Sedol had a lot of handcrafted features, including
>> information about ladders. See "extended data table 2", page 11 of the
>> Nature article. You can imagine that as watching the go board through
>> goggles that put a flag on each intersection that would result in a
>> successful ladder capture, and another flag on each intersection that
>> would result in a successful ladder escape.
>>
>> (It also means that you only need to read one move ahead to see
>> whether a move is a successful ladder breaker or not.)
>>
>> Of course, your question still stands for the Zero versions.
>>
>> Here is the table :
>>
>> Feature # of planes Description
>>
>> Stone colour3   Player stone /
>> opponent stone / empty
>> Ones1   A constant plane
>> filled with 1
>> Turns since 8   How many turns
>> since a move was played
>> Liberties   8   Number of
>> liberties (empty adjacent points)
>> Capture size8   How many opponent
>> stones would be captured
>> Self-atari size 8   How many of own
>> stones would be captured
>> Liberties after move8   Number of
>> liberties after this move is played
>> Ladder capture  1   Whether a move at this
>> point is a successful ladder capture
>> Ladder escape   1   Whether a move at
>> this point is a successful ladder escape
>> Sensibleness1   Whether a move is
>> legal and does not fill its own eyes
>> Zeros   1   A constant plane
>> filled with 0
>>
>> Player color1   Whether current
>> player is black
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread David Wu
Hex:
https://arxiv.org/pdf/1705.08439.pdf

This is not on a 19x19 board, and it was not tested against the current
state of the art (Mohex 1.0 was the state of the art at its time, but is at
least several years old now, I think), but they do get several hundred elo
points stronger than this old version of Mohex, have training curves that
suggest that they still haven' reached the limit of improvement, and are
doing it with orders of magnitude less computation than Google would have
available.

So, I think it is likely that hex is not going to be too difficult for
AlphaZero or similar architecture.


On Wed, Dec 6, 2017 at 9:28 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
wrote:

> It seems, we are living in extremely
> heavy times ...
>
> I want to go to bed now and meditate for threee days.
>
> > DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
> > Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> > https://arxiv.org/pdf/1712.01815.pdf
> >
> > AlphaZero(Chess) outperformed Stockfish after 4 hours,
> > AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> It may sound strange, but at the moment my only hopes for
> games too difficult for AlphaZero might be
>
> * a connection game like Hex (on 19x19 board)
>
> * a game like Clobber (based on CGT)
>
> Mastering Clobber would mean that also the concept of
> combinatorial game theory would be "easily" learnable.
>
>
> Side question: Would the classic Nim game be
> a trivial nut for AlphaZero ?
>
> Ingo (is now starting to hope for an AlphaZero type program
> that can do "general" mathematical research).
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Source code (Was: Reducing network size? (Was: AlphaGo Zero))

2017-10-27 Thread David Wu
I suspect the reason they were able to reasonably train a value net with
multiple komi at the same time was because the training games they used in
that paper were generated by a pure policy net, rather than by a MCTS
player, where the policy net was trained from human games.

Although humans give up points for safety when ahead, in practice it seems
like they do so less than MCTS players of the same strength, so the policy
net trained on human games would not be expected to be as strongly feature
that tendency as it would if it were MCTS games, leading to less of a bias
when adjusting the komi. Plus it might be somewhat hard for a pure policy
net to learn to evaluate the board to, say, within +/- 3 points during the
macro and micro endgame to determine when it should predict moves to become
more conservative, if the policy net was never directly trained to
simultaneously predict the value. Particularly if the data set included
many 0.5 komi games too and the policy net was not told the komi. So one
might guess that the pure policy net would less tend to give up points for
safety, even less than the human games it was trained on.

All of this might help make it so that the data set they used for training
the value net could reasonably be used without introducing too much bias
when rescoring the same games with different komi .



On Thu, Oct 26, 2017 at 6:33 PM, Shawn Ligocki  wrote:

> On Thu, Oct 26, 2017 at 2:02 PM, Gian-Carlo Pascutto 
> wrote:
>
>> On 26-10-17 15:55, Roel van Engelen wrote:
>> > @Gian-Carlo Pascutto
>> >
>> > Since training uses a ridiculous amount of computing power i wonder
>> > if it would be useful to make certain changes for future research,
>> > like training the value head with multiple komi values
>> > 
>>
>> Given that the game data will be available, it will be trivial for
>> anyone to train a different network architecture on the result and see
>> if they get better results, or a program that handles multiple komi
>> values, etc.
>>
>> The problem is getting the *data*, not the training.
>>
>
> But the data should be different for different komi values, right?
> Iteratively producing self-play games and training with the goal of
> optimizing for komi 7 should converge to a different optimal player than
> optimizing for komi 5. But maybe having high quality data for komi 7 will
> still save a lot of the work for training a komi 5 (or komi agnostic)
> network?
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Neural nets for Go - chain pooling?

2017-08-18 Thread David Wu
While browsing the online, I found an interesting idea "chain pooling"
presented here:
https://github.com/jmgilmer/GoCNN

The idea is to have some early layers that perform a max-pool across
solidly-connected stones. I could also imagine it being useful to perform a
sum. So the input would be a 19x19 layer, and the output would be a 19x19
layer where the output at a given position, if that position is occupied by
a stone, is equal to the maximum (or the sum of) all the values in the
input layer across all stones that are solidly connected to that group.

One might imagine going further and allowing the neural net some early
convolutional layers that determine the connectivity strength for this
pooling between groups, so that it could choose to pool across definite
single-point eyes or bamboo joints, etc. It's possible that one would not
want to force all layers through this operation, so possibly only some
feature planes would be fed through this operation, or perhaps all of them
but the identity transformation would also be an output of the layer to
feed into the next.

Speculatively, in the best case one might imagine this has a chance to
improve the ability of the neural net to evaluate large semeai or to judge
the status of large dragons, by letting it propagate liberty count
information (including virtual liberties due to approach moves) and
information about eyes across the board more rapidly than a series of local
convolutions could do so. In fact, it seems that convolutional layers
followed by an early pooling of this sort would make it unnecessary to
provide liberties as an input feature because it would become easy for the
neural net to compute it on its own, although one would still probably want
to provide it to save the network the effort of having to learn it.

Of course, this idea could also easily turn out worthless. One thing I'm
very not sure about is how GPU-friendly this kind of operation could be
made to be, since I don't understand GPUs. Any thoughts?
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread David Wu
Saying in an unqualified way that AlphaGo is brute force is wrong in the
spirit of the question. Assuming AlphaGo uses a typical variant of MCTS, it
is technically correct. The reason it's technically correct uninteresting
is because the bias introduced by a policy net is so extreme that it might
as well be a selective search.

Or put another way, imagine one were to set a threshold on the policy net
output past a certain point in the tree such that moves below the threshold
would be hard-pruned, and that threshold were set to a level that would
prune, say, 70% of the legal moves in an average position. In technical
sense, the search would no longer be full-width, and therefore it would
suddenly become "not brute force" according to the definition earlier in
the thread. But this distinction is not very useful, because moves in the
tree that fall below such a threshold would receive zero simulations under
any reasonable time controls anyways, so there would be no practical
observable difference in the program's search or its play.

So - spirit of the question - no AlphaGo is not brute force, its search is
selective to an extreme due to the policy net, the vast majority of
possibilities will never be in practice given any attention or time
whatsoever.

Technical answer - yes, AlphaGo is brute force, in that in the limit of
having enormously vastly many more orders of magnitude of search time than
we would ever devote to it and unbounded memory, it will theoretically
eventually search everything (maybe, it would still depend on the actual
details of its implementation).


On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> I understand why most people are saying that AlphaGo is not brute force,
> because it appears to be highly selective. But MCTS is a full width search.
> Read the AlphaGo papers, as one of the other respondents (rather
> sarcastically) suggested: AlphaGo will eventually search every move at
> every node.
>
>
>
> MCTS has the appearance of a selective search because time control
> terminates search while the tree is still ragged. In fact, it will search
> every continuation an infinite number of times.
>
>
>
> In order to have high performance, an MCTS implementation needs to search
> best moves as early as possible in each node. It is in this respect that
> AlphaGo truly excels. (AlphaGo also excels at whole board evaluation, but
> that is a separate topic.)
>
>
>
>
>
> *From:* Steven Clark [mailto:steven.p.cl...@gmail.com]
> *Sent:* Sunday, August 6, 2017 1:14 PM
> *To:* Brian Sheppard ; computer-go <
> computer-go@computer-go.org>
> *Subject:* Re: [Computer-go] Alphago and solving Go
>
>
>
> Why do you say AlphaGo is brute-force? Brute force is defined as: "In
> computer science, brute-force search or exhaustive search, also known as
> generate and test, is a very general problem-solving technique that
> consists of *systematically enumerating all possible candidates* for the
> solution and checking whether each candidate satisfies the problem's
> statement."
>
>
>
> The whole point of the policy network is to avoid brute-force search, by
> reducing the branching factor...
>
>
>
> On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go <
> computer-go@computer-go.org> wrote:
>
> Yes, AlphaGo is brute force.
>
> No it is impossible to solve Go.
>
> Perfect play looks a lot like AlphaGo in that you would not be able to
> tell the difference. But I think that AlphaGo still has 0% win rate against
> perfect play.
>
>
>
> My own best guess is that top humans make about 12 errors per game. This
> is estimated based on the win rate of top pros in head-to-head games. The
> calculation starts by assuming that Go is a win at 6.5 komi for either
> Black (more likely) or White, so a perfect player would win 100% for Black.
> Actual championship caliber players win 51% to 52% for Black. In 9-dan play
> overall, I think the rate is 53% to 54% for Black. Then you can estimate
> how many errors each player has to make to bring about such a result. E.g.,
> If players made only one error on average, then Black would win the vast
> majority of games, so they must make more errors. I came up with 12 errors
> per game, but you can reasonably get other numbers based on your model.
>
>
>
> Best,
>
> Brian
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Cai Gengyang
> *Sent:* Sunday, August 6, 2017 9:49 AM
> *To:* computer-go@computer-go.org
> *Subject:* [Computer-go] Alphago and solving Go
>
>
>
> Is Alphago brute force search?
>
> Is it possible to solve Go for 19x19 ?
>
> And what does perfect play in Go look like?
>
> How far are current top pros from perfect play?
>
>
>
> Gengyang
>
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> 

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread David Wu
Actually, a better Go-God for handicap games would probably be one that
ignores score margin as long as it's behind and simply maximizes the
entropy measure for the lowest-entropy proof tree that proves that Black is
winning. (And only counts the entropy for the black moves, not the white
moves in that tree). Once ahead, of course it can do whatever it wants that
preserves the win.

On Sun, Aug 6, 2017 at 10:31 AM, David Wu <lightvec...@gmail.com> wrote:

> * A little but not really.
> * No, and as far as we can tell, never. Even 7x7 is not rigorously solved.
> * Unknown.
> * Against Go-God (plays move that maximizes score margin, breaking ties by
> some measure of the entropy needed to build the proof tree relative to a
> human-pro-level policy net), I guess upper bound at 5 stones, likely less.
> Against Go-Devil (plays moves that maximizes win chance against *you*, has
> omniscient knowledge of all your weaknesses and perfect ability to forecast
> when you would have a brainfart and blunder), unknown, probably a bit more
> than vs Go-God.
>
> On Sun, Aug 6, 2017 at 9:49 AM, Cai Gengyang <gengyang...@gmail.com>
> wrote:
>
>> Is Alphago brute force search?
>> Is it possible to solve Go for 19x19 ?
>> And what does perfect play in Go look like?
>> How far are current top pros from perfect play?
>>
>> Gengyang
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread David Wu
* A little but not really.
* No, and as far as we can tell, never. Even 7x7 is not rigorously solved.
* Unknown.
* Against Go-God (plays move that maximizes score margin, breaking ties by
some measure of the entropy needed to build the proof tree relative to a
human-pro-level policy net), I guess upper bound at 5 stones, likely less.
Against Go-Devil (plays moves that maximizes win chance against *you*, has
omniscient knowledge of all your weaknesses and perfect ability to forecast
when you would have a brainfart and blunder), unknown, probably a bit more
than vs Go-God.

On Sun, Aug 6, 2017 at 9:49 AM, Cai Gengyang  wrote:

> Is Alphago brute force search?
> Is it possible to solve Go for 19x19 ?
> And what does perfect play in Go look like?
> How far are current top pros from perfect play?
>
> Gengyang
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread David Wu
Cool, thanks.

On Mon, Jul 24, 2017 at 10:30 AM, Gian-Carlo Pascutto <g...@sjeng.org> wrote:

> On 24-07-17 16:07, David Wu wrote:
> > Hmm. Why would discounting make things worse? Do you mean that you
> > want the top move to drop off slower (i.e. for the bot to take longer
> > to achieve the correct valuation of the top move) to give it "time"
> > to search the other moves enough to find that they're also bad?
>
> I don't want the top move to drop off slower, I just don't want to play
> other moves until they've been searched to comparable "depth".
>
> If there's a disaster lurking behind the main-variation that we only
> just started to understand, the odds are, the same disaster also lurks
> in a few of the alternative moves.
>
> > I would have thought that with typical exploration policies, whether
> > the top move drops off a little faster or a little slower, once its
> > winrate drops down close to the other moves, the other moves should
> > get a lot of simulations as well.
>
> Yes. But the goal of the discounting is, that a new move can make it
> above the old one, despite having had less total search effort.
>
> My point is that it is not always clear this is a positive effect.
>
> > I know that there are ways to handle this at the root, via time
> > control or otherwise.
>
> The situation isn't necessarily different here, if you consider that at
> the root the best published technique is still "think longer so the new
> move can overtake the old one", not "play the new move".
>
> Anyway, not saying this can't work. Just pointing out the problem areas.
>
> I would be a bit surprised if discounting worked for Go because it's
> been published for other areas (e.g. Amazons) but I don't remember any
> reports of success in Go. But the devil can be in the details (i.e. the
> discounting formula) for tricks like this.
>
> --
> GCP
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Possible idea - decay old simulations?

2017-07-24 Thread David Wu
Thanks for the replies!

On Mon, Jul 24, 2017 at 9:30 AM, Gian-Carlo Pascutto <g...@sjeng.org> wrote:

> On 23-07-17 18:24, David Wu wrote:
> > Has anyone tried this sort of idea before?
>
> I haven't tried it, but (with the computer chess hat on) these kind of
> proposals behave pretty badly when you get into situations where your
> evaluation is off and there are horizon effects. The top move drops off
> and now every alternative that has had less search looks better (because
> it hasn't seen the disaster yet). You do not want discounting in this
> situation.
>
>
Hmm. Why would discounting make things worse? Do you mean that you want the
top move to drop off slower (i.e. for the bot to take longer to achieve the
correct valuation of the top move) to give it "time" to search the other
moves enough to find that they're also bad? I would have thought that with
typical exploration policies, whether the top move drops off a little
faster or a little slower, once its winrate drops down close to the other
moves, the other moves should get a lot of simulations as well.

It's true that a move with a superior winrate than the move with the
> maximum amount of simulations is a good candidate to be better. Some
> engines will extend the time when this happens. Leela will play it, in
> certain conditions.
>
>
I know that there are ways to handle this at the root, via time control or
otherwise. The case I described here is when this happens not at the root,
but deeper in the tree. At the root, move B still looks much worse than A,
it's merely that within B's subtree there's a newly found tactic C that is
much better than A. From the root, B still looks worse than A, its winrate
has recently started to rise slowly but extremely steadily (with very high
statistical confidence, such that one might project it to eventually
overtake A).
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Possible idea - decay old simulations?

2017-07-23 Thread David Wu
I've been using Leela 0.10.0 for analysis quite often, and I've noticed
something that might lead to an improvement for the search, and maybe also
for other MCTS programs.

Sometimes, after putting hundreds of thousands of simulations into a few
possible moves, Leela will appear to favor one, disliking the others for
having clearly worse reported winrates. But then every once in a while, the
winrate for one of the other disliked moves will start rising gradually,
but very consistently.

When this happens, if I play down the variation for that move and look at
the analysis window, I often find that Leela has discovered a new tactic.
Specifically, I find a node in that subtree where one move has a greatly
higher winrate than all the others, but does not have too many simulations
yet, meaning Leela only just now found it.
(Possibly it already has more simulations than any other single move, but
the number of simulations of all of the other moves combined still
significantly outweighs it).

Going back to the root, it's clear that if the new tactic has a high enough
winrate, then the previously disliked move will eventually overtake the
favored move. But it takes a long time, since the disliked move has a lot
of bad simulations to outweigh - it's painful to watch the winrate creep up
slowly but with high upward consistency, until it finally beats out the
previously favored move.

I think there's a chance that the search could be improved by adding a
small decay over time to the weight of old simulations. This would allow a
move to be promoted a bit more rapidly with the discovery of a better
tactic. You would probably want to set the decay over time so that the
total weight over time still readily approaches infinity (e.g. a fixed
exponential decay would probably be bad, that would bound the total weight
by a constant), but perhaps a bit slower than linearly.

Thinking about it from the multi-armed-bandit perspective, I think this
also makes sense. The distribution of results from each child is
nonstationary, because the subtree below the child is evolving over time.
If they were stationary you would weight all historical simulations
equally, but since they aren't, the more-recent results from a child should
get a little bit more weight since they give you more information about the
current performance of the child move.

Has anyone tried this sort of idea before?
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread David Wu
Addendum:

Some additional playing around with the same position can flip the roles of
the playouts and value net - so now the value net is very wrong and the
playouts are mostly right. I think this gives good insight into what the
value net is doing and why as a general matter playouts are still useful.

Here's how:
Play black moves as detailed in the previous email in the "leela10.sgf"
game that Marc posted to resolve all the misunderstandings of the playouts
and get it into the "3-10% white win" phase, but otherwise leave white's
dead group on-board with tons of liberties. Let white have an absolute
feast on the rest of the board while black simply connects his stones
solidly. White gets to cut through Q6 and get pretty much every point
available.

Black is still winning even though he loses almost the entire rest of the
board, as long as the middle white group dies. But with some fiddling
around, you can arrive at a position where the value net is reporting 90%
white win (wrong), while the playouts are rightly reporting only 3-10%
white win.

Intuitively, the value net only fuzzily evaluates white's group as probably
dead, but isn't sure that it's dead, so intuitively it counts some value
for white's group "in expectation" for the small chance it lives. And the
score is otherwise not too far off on the rest of the board - the way I
played it out, black wins by only ~5 points if white dies. So the small
uncertainty that the huge white group is might actually be alive produces
enough "expected value" for white to overwhelm the 5 point loss margin,
such that the value net is 90% sure that white wins.

What the value net has failed to "understand" here is that white's group
surviving is a binary event. I.e. a 20% chance of the group being alive and
white winning by 80 points along with a 80% chance that it's dead and white
losing by 5 points does not average out to white being (0.2 * 80) - (0.8 *
5) = 16 points ahead overall (although probably the value net doesn't
exactly "think" in terms of points but rather something fuzzier). The
playouts provide the much-needed "understanding" that win/loss is binary
and that the expectation operator should be applied after mapping to
win/loss outcomes, rather than before.

It seems intuitive to me that a neural net would compute things in too much
of a fuzzy and averaged way and thereby be vulnerable to this mistake. I
wonder if it's possible to train a value net to get these things more
correct without weakening it otherwise, with the right training. As it is,
I suspect this is a systematic flaw in the value net's ability to produce
good probabilities of winning in games where the game hinges on the life
and death chances of a single large dragon, and where the expected score
could be wildly uncorrelated with the probability of winning.


On Mon, May 22, 2017 at 9:39 PM, David Wu <lightvec...@gmail.com> wrote:

> Leela playouts are definitely extremely bad compared to competitors like
> Crazystone. The deep-learning version of Crazystone has no value net as far
> as I know, only a policy net, which means it's going on MC playouts alone
> to produce its evaluations. Nonetheless, its playouts often have noticeable
> and usually correct opinions about early midgame game positions (as
> confirmed by the combination of own judgment as a dan player and Leela's
> value net). Which I find amazing - that it can even approximately get these
> right.
>
> On to the game:
>
> Analyzing with Leela 0.10.0 in that second game, I think I can infer
> pretty exactly what the playouts are getting wrong. Indeed the upper left
> is being disastrously misplayed by them, but that's not all. -  I'm finding
> that multiple different things are all being played out wrong. All of the
> following numbers are on white's turn, to give white the maximal chance to
> distract black from resolving the tactics in the tree search and forcing
> the playouts to do the work - the numbers are sometimes better if it's
> black's turn.
>
> * At move 186, playouts as it stands show about on the order of 60% for
> white, despite black absolutely having won the game. I partly played out
> the rest of the board in a very slow and solid way just in case it was
> confusing things, but not entirely, so that the tree search would still
> have plenty of endgame moves to be distracted by. Playouts stayed at 60%
> for white.
>
> * Add bA15, putting white down to 2 liberties: the PV shows an exchange of
> wA17 and bA14, keeping white at 2 liberties, and it drops to 50%.
> * Add bA16, putting white in atari: it drops to 40%.
>
> So clearly there's some funny business with black in the playouts
> self-atariing or something, and the chance that black does this lessens as
> white has fewer liberties and therefore is more likely to die first. G

Re: [Computer-go] Patterns and bad shape

2017-04-17 Thread David Wu
Hmm. Do you know that Leela does something special here? When I look at
Leela's analysis output it seems to the search seems not to consider the
ladder escape because the policy net assigns a low probability to it (and
such a high probability to move in the upper right). Which is the same as
in other non-ladder situations when the policy net puts 90+% weight on one
move and almost nothing on other moves. If you do get it to search the
ladder escape at all, it reads it correctly and likes it.

So It seems to me the issue is primarily the policy net giving too low of a
weight to the ladder escape. And the policy net doesn't read, because it's
a neural net - I'd expect the most it would be getting is a feature plane
that says "is this stone ladder-capturable" performed via a simple
recursive search completely independent from the MCTS that shouldn't
misread a ladder this simple unless it's outright buggy.


On Mon, Apr 17, 2017 at 11:25 AM, Stefan Kaitschick <skaitsch...@gmail.com>
wrote:

>
> On Mon, Apr 17, 2017 at 3:04 PM, David Wu <lightvec...@gmail.com> wrote:
>
>> To some degree this maybe means Leela is insufficiently explorative in
>> cases like this, but still, why does the policy net not put H5 more than
>> 1.03%. After all, it's vastly more likely than 1% that that a good player
>> will see the ladder works here and try escaping in this position.
>
>
>
> I think it's likely that the ladder is actually the first thing that Leela
> considered. My guess is that it put a penalty on the move after misreading
> the ladder.
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Patterns and bad shape

2017-04-17 Thread David Wu
Hmmm, screenshot doesn't seem to be getting through to the list, so here's
a textual graphic instead.

   A B C D E F G H J K L M N O P Q R S T
 +---+
  19 | . . . . . . . . . . . . . . . . . . . |
  18 | . . . . . . . . . . . . . . . . . . . |
  17 | . . . . . . . . . . . . . X . . . . . |
  16 | . . . . X . . . . . . . . . . O . . . |
  15 | . . . . . . . . . . . . . X . . . . . |
  14 | . . . . . . . . . . . . . . . . O . . |
  13 | . . . . . . . . . . . . . . . X O . . |
  12 | . . . . . . . . . . . . . . . X O . . |
  11 | . . . . . . . . . . . . . . . a . . . |
  10 | . . . . . . . . . . . . . . . . . . . |
   9 | . . . . . . . . . . . . . . . . . . . |
   8 | . . . . . . . . . . . . . . . . . . . |
   7 | . . . . . . . . . . . . . . . . . . . |
   6 | . . . . . . O . . . . . . . . . . . . |
   5 | . . . . . O X b . . . . . . . . . . . |
   4 | . . . X X X O O . . . . . . . . . . . |
   3 | . . . . X O . . . . . . . . . O . . . |
   2 | . . . . . . . . . . . . . . . . . . . |
   1 | . . . . . . . . . . . . . . . . . . . |
 +---+

Black (X) to move. The screenshot showed that Leela's policy net put about
96% probability on a and only 1.03% on b. And that even after nearly 1
million simulations had basically not searched b at all.


On Mon, Apr 17, 2017 at 9:04 AM, David Wu <lightvec...@gmail.com> wrote:

> I highly doubt that learning refutations to pertinent bad shape patterns
> is much more exponential or combinatorially huge than learning good shape
> already is, if you could somehow be appropriately selective about it. There
> are enough commonalities that it should "compress" very well into what the
> neural net learns.
>
> To draw an analogy with humans, for example humans seem to have absolutely
> no issue learning refutations to lots of bad shapes right alongside good
> shapes, and don't seem to find it orders of magnitude harder to apply their
> pattern recognition abilities to that task. Indeed people learn bad shape
> refutations all the time from performing "training updates" based on their
> own reading about what works or not every time they read things in a game,
> (almost certainly a thing the human brain "does" to maximize the use of
> data), as well as the results of their own mistakes, and also learn and
> refutations of non-working tactics all the time if they do things like
> tsumego. But the distribution on bad moves that people devote their
> attention to learning to refute is extremely nonuniform - e.g. nobody
> studies what to do if the opponent plays the 1-2 point against your 4-4
> point - and I bet that's important too.
>
> I'm also highly curious if there's anyone who has experimented with or
> found any ways to mitigate this issue!
>
> --
>
> If you want an example of this actually mattering, here's example where
> Leela makes a big mistake in a game that I think is due to this kind of
> issue. Leela is white.
>
> The last several moves to reach the screenshotted position were:
> 16. wG6 (capture stone in ladder)
> 17. bQ13 (shoulder hit + ladder breaker)
> 18. wR13 (push + restore ladder)
> 19. bQ12 (extend + break ladder again)
> 20. wR12 (Leela's mistake, does not restore ladder)
>
> You can see why Leela makes the mistake at move 20. On move 21 it pretty
> much doesn't consider escaping at H5. The policy net puts it at 1.03%,
> which I guess is just a little too low for it to get some simulations here
> even with a lot of thinking time. So Leela thinks black is behind,
> considering moves with 44% and 48% overall win rates. However, if you play
> H5 on the board, Leela almost instantly changes its mind to 54% for black.
> So not seeing H5 escape on move 21 leads it to blunder on move 20. If you
> go back on move 20 and force Leela to capture the ladder, it's happy with
> the position for white, so it's also not as if Leela dislikes the ladder
> capture, it's just that it wrongly thinks R12 is slightly better due to not
> seeing the escape.
>
> To some degree this maybe means Leela is insufficiently explorative in
> cases like this, but still, why does the policy net not put H5 more than
> 1.03%. After all, it's vastly more likely than 1% that that a good player
> will see the ladder works here and try escaping in this position. I would
> speculate that this is because of biases in training. If Leela is trained
> on high amateur and pro games, then one would expect that in nearly all
> training games, conditional on white playing a move like R12 and ignoring
> the ladder escape, the ladder is often not too important and therefore
> Black should usually not escape. By contrast, when the ladder escape is
> important, then in such games White will capture at H5 instead of p