Re: [Computer-go] Alphago and solving Go
*Is Alphago **brute **force search? * No, simply because there are way to many possibilities in the game, roughly (19x19)! Alphago tries to consider the game like the human do: it evaluates the board from only a limited set of moves, based on its "instinct". This instinct is generated from deep convolutionnal neural network (see alphago's aper for details http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html?foxtrotcallback=true) *Is it possible to solve Go for 19x19 ? * If I remember correctly, I pretty sure a team has solved go for 5x5 boards. I let you guess reaching 19x19 is quite impossible. It is often said there are ore games of go than atoms in the universe *And what does perfect play in Go look like? * => Alphago is currently the best players, hence the closest to a perfect play. Google Deepmind has published 50 self played games alphago https://deepmind.com/research/alphago/alphago-vs-alphago-self-play-games/. Some reviews by Michael Redmond have been published on youtube: https://www.youtube.com/watch?v=vjsN9BRInys *How far are current top pros from perfect play?* => Difficult to say, even if Alphago is strong, some pros feel it still does some mistakes. Vincent Richard Le 06-Aug-17 à 10:49 PM, Cai Gengyang a écrit : Is Alphago brute force search? Is it possible to solve Go for 19x19 ? And what does perfect play in Go look like? How far are current top pros from perfect play? Gengyang ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] purpose of zero-filled feature planes in CNN
It does, and for the exact same reason than a plan filled with 1. You have a lot of bias inside your networks so whatever the input you give, you can be sure it will be transformed, be it a plan full of 0 or a plan full of 1. As you said, it helps the network to keep the track of the boundaries after the image is zero-padded. The real question is more like: is it useful to have both? I haven't tested it but I guess that the min-max boundaries has to be somehow a useful information for the network. Vincent Richard Le 18-Jul-17 à 7:53 PM, Brian Lee a écrit : I've been wondering about something I've seen in a few papers (AlphaGo's paper, Cazenave's resnet policy architecture), which is the presence of an input plane filled with 0s. The input features also typically include a plane of 1s, which makes sense to me - zero-padding before a convolution means that the 0/1 demarcation line tells the CNN where the edge of the board is. But as far as I can tell, a plane of constant 0s should do absolutely nothing. Can anyone enlighten me? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Re : Atari Go
I have never tried Atari go, but every single time I have experienced a MCTS playing random, it was either:- A bug in the MC policy (wrong distribution, rotated, etc)- A bug in the scoring function/branch updateSince your bot can play 9x9 I guess it's the second. Ofc it's easy to know the winner, but if by any chance the information is reverted in the branch update the bot will try to play as bad as possible.You can even test this with the original code. Instead of returning the final score, simply return the first player who captured in the game. Message original Objet : [Computer-go] Atari GoDe : Andreas PerssonÀ : computer-go@computer-go.orgCc : Hi, toying with Don Dailey's old reference bot (pure monte carlo). Trying to make it play Atari Go, it plays decent regular Go on a 9x9 but not well at all when trying to make it play Atari Go. I try ending each playout as soon as a capture is made and score the moves with amaf. This leads to almost random play even with a big number of playouts. Anyone tried making a Atari Go monte carlo player, anything obvious I am missing?Regards Andreas___Computer-go mailing listComputer-go@computer-go.orghttp://computer-go.org/mailman/listinfo/computer-go___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
Finally found the problem. In the end, it was as stupid as expected: When I pick a game for the batch creation I select randomly a limited number of moves inside the game. In the case of the value network I use like 8-16 moves to not overfit the data (I can't take 1 or then the I/O operations slow down the training) and for other networks, I would simply take all the moves. Or at least this was what I thought my code was doing. Instead of picking N random moves in the game, it was picking the first N moves in a random order. So... my value network was trained to tell me the game is balanced at the beginning... Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit : On 19/06/2017 21:31, Vincent Richard wrote: - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). You absolutely need the latter, especially as for outcome prediction the moves from the same game are not independent samples. During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. ... However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. A problem with side to move/won side marking in the input or feature planes, or with the expected outcome (0 vs 1 vs -1)? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
This is what have been thinking about, yet unable to find an error. Currently, I'm working with: - SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php (until recently I was working with games of all level from KGS) - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). I set the batch size according to my GPU memory (200 moves in the case of full sized value/policy network). I don't think the problem may come from here since the data is the same for all the networks - For the input, I’m using the same architecture as https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of shapes, from minimalist to alphago) - For the network, I’m once again using TheDuck314 network (EvalModels.Conv11PosDepFC1ELU) with the same layers https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and the learning rate he recommends During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit : On 19-06-17 17:38, Vincent Richard wrote: During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? 1) What is the training data for the value network? How big is it, how is it presented/shuffled/prepared? 2) What is the *exact* structure of the network and training setup? My best guess would be an error in the construction of the final layers. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Value network that doesn't want to learn.
Hello everyone, For my master thesis, I have built an AI that has a strategical approach to the game. It doesn’t play but simply describe the strategy behind all possible move for a given strategy ("enclosing this group", "making life for this group", "saving these stones", etc). My main idea is that once associated with a playing AI, I will be able to generate comments on a position (and then teach people). So for my final experiment, I’m trying to build a playing AI. I don’t want it to be highly competitive, I just need it to be decent (1d or so), so I thought about using a policy network, a value network and a simple MCTS. The MCTS works fine, the policy network learns quickly and is accurate, but the value network seems to never learn, even the slightest. During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? Thank you ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go