Re: [Computer-go] Alphago and solving Go

2017-08-06 Thread Vincent Richard

*Is Alphago **brute **force search? *

No, simply because there are way to many possibilities in the game, 
roughly (19x19)!


Alphago tries to consider the game like the human do: it evaluates the 
board from only a limited set of moves, based on its "instinct". This 
instinct is generated from deep convolutionnal neural network (see 
alphago's aper for details 
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html?foxtrotcallback=true)


*Is it possible to solve Go for 19x19 ? *
If I remember correctly, I pretty sure a team has solved go for 5x5 
boards. I let you guess reaching 19x19 is quite impossible. It is often 
said there are ore games of go than atoms in the universe


*And what does perfect play in Go look like? *
=> Alphago is currently the best players, hence the closest to a perfect 
play. Google Deepmind has published 50 self played games alphago 
https://deepmind.com/research/alphago/alphago-vs-alphago-self-play-games/. 
Some reviews by Michael Redmond have been published on youtube: 
https://www.youtube.com/watch?v=vjsN9BRInys


*How far are current top pros from perfect play?*
=> Difficult to say, even if Alphago is strong, some pros feel it still 
does some mistakes.



Vincent Richard


Le 06-Aug-17 à 10:49 PM, Cai Gengyang a écrit :

Is Alphago brute force search?
Is it possible to solve Go for 19x19 ?
And what does perfect play in Go look like?
How far are current top pros from perfect play?

Gengyang


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] purpose of zero-filled feature planes in CNN

2017-07-18 Thread Vincent Richard

It does, and for the exact same reason than a plan filled with 1.

You have a lot of bias inside your networks so whatever the input you 
give, you can be sure it will be transformed, be it a plan full of 0 or 
a plan full of 1. As you said, it helps the network to keep the track of 
the boundaries after the image is zero-padded. The real question is more 
like: is it useful to have both?


I haven't tested it but I guess that the min-max boundaries has to be 
somehow a useful information for the network.



Vincent Richard


Le 18-Jul-17 à 7:53 PM, Brian Lee a écrit :
I've been wondering about something I've seen in a few papers 
(AlphaGo's paper, Cazenave's resnet policy architecture), which is the 
presence of an input plane filled with 0s.


The input features also typically include a plane of 1s, which makes 
sense to me - zero-padding before a convolution means that the 0/1 
demarcation line tells the CNN where the edge of the board is. But as 
far as I can tell, a plane of constant 0s should do absolutely 
nothing. Can anyone enlighten me?



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Re : Atari Go

2017-07-13 Thread Vincent Richard
I have never tried Atari go, but every single time I have experienced a MCTS playing random, it was either:- A bug in the MC policy (wrong distribution, rotated, etc)- A bug in the scoring function/branch updateSince your bot can play 9x9 I guess it's the second. Ofc it's easy to know the winner, but if by any chance the information is reverted in the branch update the bot will try to play as bad as possible.You can even test this with the original code. Instead of returning the final score, simply return the first player who captured in the game. Message original Objet : [Computer-go] Atari GoDe : Andreas Persson À : computer-go@computer-go.orgCc : Hi, toying with Don Dailey's old reference bot (pure monte carlo). Trying to make it play Atari Go, it plays decent regular Go on a 9x9 but not well at all when trying to make it play Atari Go. I try ending each playout as soon as a capture is made and score the moves with amaf. This leads to almost random play even with a big number of playouts. Anyone tried making a Atari Go monte carlo player, anything obvious I am missing?Regards Andreas___Computer-go mailing listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Vincent Richard

Finally found the problem. In the end, it was as stupid as expected:

When I pick a game for the batch creation I select randomly a limited 
number of moves inside the game. In the case of the value network I use 
like 8-16 moves to not overfit the data (I can't take 1 or then the I/O 
operations slow down the training) and for other networks, I would 
simply take all the moves. Or at least this was what I thought my code 
was doing. Instead of picking N random moves in the game, it was picking 
the first N moves in a random order. So... my value network was trained 
to tell me the game is balanced at the beginning...



Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit :

On 19/06/2017 21:31, Vincent Richard wrote:

- The data is then analyzed by a script which extracts all kind of
features from games. When I'm training a network, I load the features I
want from this analysis to build the batch. I have 2 possible methods
for the batch construction. I can either add moves one after the other
(the fast mode) or pick random moves among different games (slower but
reduces the variance).

You absolutely need the latter, especially as for outcome prediction the
moves from the same game are not independent samples.


During sime of the tests, all the networks I was training had the same
layers except for the last. So as you suggested, I was also wondering if
this last layer wasn’t the problem. Yet, I haven’t found any error.

...

However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.

A problem with side to move/won side marking in the input or feature
planes, or with the expected outcome (0 vs 1 vs -1)?



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

This is what have been thinking about, yet unable to find an error.

Currently, I'm working with:

- SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php 
(until recently I was working with games of all level from KGS)


- The data is then analyzed by a script which extracts all kind of 
features from games. When I'm training a network, I load the features I 
want from this analysis to build the batch. I have 2 possible methods 
for the batch construction. I can either add moves one after the other 
(the fast mode) or pick random moves among different games (slower but 
reduces the variance). I set the batch size according to my GPU memory 
(200 moves in the case of full sized value/policy network). I don't 
think the problem may come from here since the data is the same for all 
the networks


- For the input, I’m using the same architecture as 
https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of 
shapes, from minimalist to alphago)


- For the network, I’m once again using TheDuck314 network 
(EvalModels.Conv11PosDepFC1ELU) with the same layers 
https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and 
the learning rate he recommends


During sime of the tests, all the networks I was training had the same 
layers except for the last. So as you suggested, I was also wondering if 
this last layer wasn’t the problem. Yet, I haven’t found any error.




Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit :

On 19-06-17 17:38, Vincent Richard wrote:


During my research, I’ve trained a lot of different networks, first on
9x9 then on 19x19, and as far as I remember all the nets I’ve worked
with learned quickly (especially during the first batches), except the
value net which has always been problematic (diverge easily, doesn't
learn quickly,...) . I have been stuck on the 19x19 value network for a
couple months now. I’ve tried countless of inputs (feature planes) and
lots of different models, even using the exact same code as others. Yet,
whatever I try, the loss value doesn’t move an inch and accuracy stays
at 50% (even after days of training). I've tried to change the learning
rate (increase/decrease), it doesn't change. However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.
It is even more frustrating that training any other kind of network
(predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has an
idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

Hello everyone,

For my master thesis, I have built an AI that has a strategical approach 
to the game. It doesn’t play but simply describe the strategy behind all 
possible move for a given strategy ("enclosing this group", "making life 
for this group", "saving these stones", etc). My main idea is that once 
associated with a playing AI, I will be able to generate comments on a 
position (and then teach people). So for my final experiment, I’m trying 
to build a playing AI. I don’t want it to be highly competitive, I just 
need it to be decent (1d or so), so I thought about using a policy 
network, a value network and a simple MCTS.  The MCTS works fine, the 
policy network learns quickly and is accurate, but the value network 
seems to never learn, even the slightest.


During my research, I’ve trained a lot of different networks, first on 
9x9 then on 19x19, and as far as I remember all the nets I’ve worked 
with learned quickly (especially during the first batches), except the 
value net which has always been problematic (diverge easily, doesn't 
learn quickly,...) . I have been stuck on the 19x19 value network for a 
couple months now. I’ve tried countless of inputs (feature planes) and 
lots of different models, even using the exact same code as others. Yet, 
whatever I try, the loss value doesn’t move an inch and accuracy stays 
at 50% (even after days of training). I've tried to change the learning 
rate (increase/decrease), it doesn't change. However, if I feed a stupid 
value as target output (for example black always win) it has no trouble 
learning.
It is even more frustrating that training any other kind of network 
(predicting next move, territory,...) goes smoothly and fast.


Has anyone experienced a similar problem with value networks or has an 
idea of the cause?


Thank you
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go