Re: [Computer-go] mini-max with Policy and Value network

David Wu Mon, 22 May 2017 19:37:37 -0700

Leela playouts are definitely extremely bad compared to competitors like
Crazystone. The deep-learning version of Crazystone has no value net as far
as I know, only a policy net, which means it's going on MC playouts alone
to produce its evaluations. Nonetheless, its playouts often have noticeable
and usually correct opinions about early midgame game positions (as
confirmed by the combination of own judgment as a dan player and Leela's
value net). Which I find amazing - that it can even approximately get these
right.

On to the game:

Analyzing with Leela 0.10.0 in that second game, I think I can infer pretty
exactly what the playouts are getting wrong. Indeed the upper left is being
disastrously misplayed by them, but that's not all. -  I'm finding that
multiple different things are all being played out wrong. All of the
following numbers are on white's turn, to give white the maximal chance to
distract black from resolving the tactics in the tree search and forcing
the playouts to do the work - the numbers are sometimes better if it's
black's turn.

* At move 186, playouts as it stands show about on the order of 60% for
white, despite black absolutely having won the game. I partly played out
the rest of the board in a very slow and solid way just in case it was
confusing things, but not entirely, so that the tree search would still
have plenty of endgame moves to be distracted by. Playouts stayed at 60%
for white.

* Add bA15, putting white down to 2 liberties: the PV shows an exchange of
wA17 and bA14, keeping white at 2 liberties, and it drops to 50%.
* Add bA16, putting white in atari: it drops to 40%.

So clearly there's some funny business with black in the playouts
self-atariing or something, and the chance that black does this lessens as
white has fewer liberties and therefore is more likely to die first. Going
further:

* Add bE14, it drops to 30%.

So the potential damezumari is another problem - I guess like 10% of the
playouts are letting white kill the big black lump at E13. Now, 30% is
still way too high. Where does the rest come from?

* Add bA17, having black actually capture. it drops to 20%. So looks like
even having white in atari doesn't stop the playouts from misplaying. Black
in the misplays that corner despite being able to capture white any time!

Now where are the remaining 20% losses coming from?

* Have black physically capture the 5 white stones at K18: still 20%
* Solidly connect everything for black around the board: still 20%.
* Add bK18: drops to 10%.

Oh! So 10% more more came from black dying at the top. Even after black has
physically captured the 5 white stones there and having a living 5-space
eye, the playouts have black die there maybe 10% of the time.

Okay, now how is black still losing 10% of the time? White literally cannot
make 2 eyes even if he gets infinitely many moves in a row. I've already
solidly connected everything so that even if black passes forever, white
can kill literally nothing of black's and black will have enough points.

* Add wK14 or wM14: 3%
* Instead of that, add bK14: increases to 20% (!!!). Then have white
capture those 2 stones with wM14: 3%.
* Instead of all that, add both K16 and M16: 3%.

So clearly what's going on is that the playouts allow suicide, despite the
Leela gui not allowing it. That's the only way that white could possibly
live - black suicides 3 stones then white makes 2 eyes. And this neatly
explains these observations - adding bK14 puts black one step closer to
suicide 3 stones, so the MC winrate for white rises to 20%. Having white
play removes the possibility of black handing white 2 eyes via suicide. And
similarly filling K16 and M16 makes it so that when black plays 3 stones,
it's capture and not a suicide. So the playouts allowing suicide here is
leading to some incredibly bad MC evaluations.

Okay, so now that we removed that possibility, how is black still losing 3%
of the time?

* Remove liberties from white: once most of them are gone, drops to 2%.
Then drops to 1%, then once white is in atari, drops to -1%.
(the -1% presumably comes from the expected thing documented in the Leela
FAQ where Leela counts big wins as a bit more than 100%).

Now I'm just speculating. My guess is that somehow 3% of the time, the game
is scored without black having captured white's group. As in - black
passes, white passes, white's dead group is still on the board, so white
wins. The guess would be that liberties and putting it in atari increases
the likelihood that the playouts kill the group before having both players
pass and score. But that's just a guess, maybe there's also more black
magic involving adjusting the "value" of a win depending on unknown factors
beyond just having a "big win". Would need Gian-Carlo to actually confirm
or refute this guess though.

----

Phew! That was some long and fun exploration of how light Leela's playouts
are.
Hopefully some of this helps improve the playouts for future versions of
Leela - given that they're a significant weight in the evaluation alongside
the value net, they're probably one of the major things holding Leela back
at this point.

On Mon, May 22, 2017 at 3:01 PM, Marc Landgraf <mahrgel...@gmail.com> wrote:

> And talkig about tactical mistakes:
> Another game, where a trick joseki early in the game (top right)
> completely fools Leela. Leela here play this like it would be done in
> similar shapes, but then gets completely blindsided. But to make things
> worse, it finds the one way to make the loss the biggest. (note: this is
> not reliable when trying this trick joseki, Leela will often lose the 4/5
> stones on the left, but will at least take the one stone on top in sente
> instead of screwing up like it did here) Generally this "trick" is not that
> deep reading wise, but given its similarity to more common shapes I can
> understand how the bot falls for it.
> Anyway, Leela manages to fully stabilize the game (given our general
> difference in strength, this should come as no surprise), just to throw
> away the center group.
>
> But what you should really look at here is Leelas evaluation of the game.
>
> Even very late in the game, the MC part of Leela considers Leela well
> ahead, completely misreading the L+D here. Usually in most games Leela
> loses to me, the issue comes the other way around. Leela NN strongly
> believes in the game to be won, while the MC-part notices the real trouble.
> But not here. Now of course this kind of misjudgement also could serve as
> explaination how this group could die in first place.
>
> But having had my own MC-Bot I really wonder how it could misevaluate so
> badly here. To really lose this game as Black it either requires
> substantial self ataris by Black, or large unanswered self atari by White.
> Does Leela have such light playouts that those groups can really flip
> status in 60%+ of the MC-Evaluations?
>
> 2017-05-22 20:46 GMT+02:00 Marc Landgraf <mahrgel...@gmail.com>:
>
>> Leela has surprisingly large tactical holes. Right now it is throwing a
>> good number of games against me in completely won endgames by fumbling away
>> entirely alive groups.
>>
>> As an example I attached one game of myself (3d), even vs Leela10 @7d.
>> But this really isn't a onetime occurence.
>>
>> If you look around move 150, the game is completely over by human
>> standards as well as Leelas evaluation (Leela will give itself >80% here)
>> But then Leela starts doing weird things.
>> 186 is a minor mistake, but itself does not yet throw the game. But it is
>> the start of series of bad turns.
>> 236 then is a non-threat in a Ko fight, and checking Leelas evaluation,
>> Leela doesn't even consider the possibility of it being ignored. This is
>> btw a common topic with Leela in ko fights - it does not look at all at
>> what happens if the Ko threat is ignored.
>> 238 follows up the "ko threat", but this move isn't doing anything
>> either! So Leela passed twice now.
>> Suddenly there is some Ko appearing at the top right.
>> Leela plays this Ko fight in some suboptimal way, not fully utilizing
>> local ko threats, but this is a concept rather difficult to grasp for AIs
>> afaik.
>> I can not 100% judge whether ignoring the black threat of 253 is correct
>> for Leela, I have some doubts on this one too.
>> With 253 ignored, the game is now heavily swinging, but to my judgement,
>> playing the hane instead of 256 would still keep it rather close and I'm
>> not 100% sure who would win it now. But Leela decides to completely bury
>> itself here with 256, while giving itself still 70% to win.
>> As slowly realization of the real game state kicks in, the rest of the
>> game is then the usual MC-throw away style we have known for years.
>>
>> Still... in this game you can see how a series of massive tactical
>> blunders leads to throwing a completely won game. And this is just one of
>> many examples. And it can not be all pinned on Ko's. I have seen a fair
>> number of games where Leela does similar mistakes without Ko involved, even
>> though Ko's drastically increase Leelas fumble chance.
>> At the same time, Leela is completely and utterly outplaying me on a
>> strategical level and whenever it manages to not make screwups like the
>> ones shown I stand no chance at all. Even 3 stones is a serious challenge
>> for me then. But those mistakes are common enough to keep me around even.
>>
>> 2017-05-22 17:47 GMT+02:00 Erik van der Werf <erikvanderw...@gmail.com>:
>>
>>> On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto <g...@sjeng.org>
>>> wrote:
>>>
>>>> On 22-05-17 11:27, Erik van der Werf wrote:
>>>> > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto <g...@sjeng.org
>>>> > <mailto:g...@sjeng.org>> wrote:
>>>> >
>>>> >     ... This heavy pruning
>>>> >     by the policy network OTOH seems to be an issue for me. My
>>>> program has
>>>> >     big tactical holes.
>>>> >
>>>> >
>>>> > Do you do any hard pruning? My engines (Steenvreter,Magog) always had
>>>> a
>>>> > move predictor (a.k.a. policy net), but I never saw the need to do
>>>> hard
>>>> > pruning. Steenvreter uses the predictions to set priors, and it is
>>>> very
>>>> > selective, but with infinite simulations eventually all potentially
>>>> > relevant moves will get sampled.
>>>>
>>>> With infinite simulations everything is easy :-)
>>>>
>>>> In practice moves with, say, a prior below 0.1% aren't going to get
>>>> searched, and I still regularly see positions where they're the winning
>>>> move, especially with tactics on the board.
>>>>
>>>> Enforcing the search to be wider without losing playing strength appears
>>>> to be hard.
>>>>
>>>>
>>> Well, I think that's fundamental; you can't be wide and deep at the same
>>> time, but at least you can chose an algorithm that (eventually) explores
>>> all directions.
>>>
>>> BTW I'm a bit surprised that you are still able to find 'big tactical
>>> holes' with Leela now playing as 8d KGS
>>>
>>> Best,
>>> Erik
>>>
>>>
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>
>>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to