Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Gian-Carlo Pascutto
On 24-05-17 05:33, "Ingo Althöfer" wrote: >> So, 0.001% probability. Demis commented that Lee Sedol's winning move in >> game 4 was a one in 10 000 move. This is a 1 in 100 000 move. > > In Summer 2016 I checked the games of AlphaGo vs Lee Sedol > with repeated runs of CrazyStone DL: > In 3 of 20

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Ingo Althöfer
Hi, just my 2 Cent. "Gian-Carlo Pascutto" wrote: > In the attached SGF, AlphaGo played P10, which was considered a very > surprising move by all commentators... > I can sort-of confirm this: > > 0.295057654 (E13) > ...(60 more moves follow)... > 0.11952 (P10) > > So,

Re: [Computer-go] mini-max with Policy and Value network

2017-06-07 Thread Hideki Kato
Alvaro Begue:

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 17:19, Hideki Kato wrote: > Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: > >> Now, even the original AlphaGo played moves that surprised human pros >> and were contrary to established sequences. So where did those come >> from? Enough computation power to

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread valkyria
(3) CNN cannot learn exclusive-or function due to the ReLU activation function, instead of traditional sigmoid (tangent hyperbolic). CNN is good at approximating continuous (analog) functions but Boolean (digital) ones. Are you sure about that? I can imagine using two ReLU units to construct

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Álvaro Begué
On Tue, May 23, 2017 at 4:51 AM, Hideki Kato wrote: > (3) CNN cannot learn exclusive-or function due to the ReLU > activation function, instead of traditional sigmoid (tangent > hyperbolic). CNN is good at approximating continuous (analog) > functions but Boolean

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Gian-Carlo Pascutto: : >On 23-05-17 10:51, Hideki Kato wrote: >> (2) The number of possible positions (input of the value net) in >> real games is at least 10^30 (10^170 in theory). If the value >> net can recognize all? L depend on very small

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: >Now, even the original AlphaGo played moves that surprised human pros >and were contrary to established sequences. So where did those come >from? Enough computation power to overcome the low probability? >Synthesized by

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Erik van der Werf: : >On Tue, May 23, 2017 at 10:51 AM, Hideki Kato >wrote: > >> Agree. >> >> (1) To solve L, some search is necessary in practice. So, the >> value net cannot solve some of them. >> (2)

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 10:51, Hideki Kato wrote: > (2) The number of possible positions (input of the value net) in > real games is at least 10^30 (10^170 in theory). If the value > net can recognize all? L depend on very small difference of > the placement of stones or liberties. Can we provide

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 22-05-17 21:01, Marc Landgraf wrote: > But what you should really look at here is Leelas evaluation of the game. Note that this is completely irrelevant for the discussion about tactical holes and the position I posted. You could literally plug any evaluation into it (save for a static oracle,

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Erik van der Werf
On Mon, May 22, 2017 at 4:54 PM, Gian-Carlo Pascutto wrote: > On 22-05-17 15:46, Erik van der Werf wrote: > > Anyway, LMR seems like a good idea, but last time I tried it (in Migos) > > it did not help. In Magog I had some good results with fractional depth > > reductions (like

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Gian-Carlo Pascutto
On 23-05-17 03:39, David Wu wrote: > Leela playouts are definitely extremely bad compared to competitors like > Crazystone. The deep-learning version of Crazystone has no value net as > far as I know, only a policy net, which means it's going on MC playouts > alone to produce its evaluations.

Re: [Computer-go] mini-max with Policy and Value network

2017-05-23 Thread Hideki Kato
Agree. (1) To solve L, some search is necessary in practice. So, the value net cannot solve some of them. (2) The number of possible positions (input of the value net) in real games is at least 10^30 (10^170 in theory). If the value net can recognize all? L depend on very small difference

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread David Wu
Addendum: Some additional playing around with the same position can flip the roles of the playouts and value net - so now the value net is very wrong and the playouts are mostly right. I think this gives good insight into what the value net is doing and why as a general matter playouts are still

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Marc Landgraf
And talkig about tactical mistakes: Another game, where a trick joseki early in the game (top right) completely fools Leela. Leela here play this like it would be done in similar shapes, but then gets completely blindsided. But to make things worse, it finds the one way to make the loss the

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Marc Landgraf
Leela has surprisingly large tactical holes. Right now it is throwing a good number of games against me in completely won endgames by fumbling away entirely alive groups. As an example I attached one game of myself (3d), even vs Leela10 @7d. But this really isn't a onetime occurence. If you look

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 17:47, Erik van der Werf wrote: > On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto > wrote: > > Well, I think that's fundamental; you can't be wide and deep at the same > time, but at least you can chose an algorithm that (eventually)

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 3:56 PM, Gian-Carlo Pascutto wrote: > On 22-05-17 11:27, Erik van der Werf wrote: > > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > > wrote: > > > > ... This heavy pruning > > by the policy network

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 15:46, Erik van der Werf wrote: > Oh, haha, after reading Brian's post I guess I misunderstood :-) > > Anyway, LMR seems like a good idea, but last time I tried it (in Migos) > it did not help. In Magog I had some good results with fractional depth > reductions (like in Realization

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 14:48, Brian Sheppard via Computer-go wrote: > My reaction was "well, if you are using alpha-beta, then at least use > LMR rather than hard pruning." Your reaction is "don't use > alpha-beta", and you would know better than anyone! There's 2 aspects to my answer: 1) Unless you've

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 22-05-17 11:27, Erik van der Werf wrote: > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > wrote: > > ... This heavy pruning > by the policy network OTOH seems to be an issue for me. My program has > big tactical holes. > > > Do

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 11:27 AM, Erik van der Werf < erikvanderw...@gmail.com> wrote: > On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto > wrote: >> >> ... This heavy pruning >> by the policy network OTOH seems to be an issue for me. My program has >> big tactical holes. >

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Brian Sheppard via Computer-go
er-go.org] On Behalf Of Gian-Carlo Pascutto Sent: Monday, May 22, 2017 4:08 AM To: computer-go@computer-go.org Subject: Re: [Computer-go] mini-max with Policy and Value network On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote: > Could use late-move reductions to eliminate the hard pruning. Giv

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Erik van der Werf
On Mon, May 22, 2017 at 10:08 AM, Gian-Carlo Pascutto wrote: > > ... This heavy pruning > by the policy network OTOH seems to be an issue for me. My program has > big tactical holes. Do you do any hard pruning? My engines (Steenvreter,Magog) always had a move predictor (a.k.a.

Re: [Computer-go] mini-max with Policy and Value network

2017-05-22 Thread Gian-Carlo Pascutto
On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote: > Could use late-move reductions to eliminate the hard pruning. Given > the accuracy rate of the policy network, I would guess that even move > 2 should be reduced. > The question I always ask is: what's the real difference between MCTS

Re: [Computer-go] mini-max with Policy and Value network

2017-05-20 Thread Brian Sheppard via Computer-go
Could use late-move reductions to eliminate the hard pruning. Given the accuracy rate of the policy network, I would guess that even move 2 should be reduced. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Hiroshi Yamashita Sent: