Re: [Computer-go] Value network that doesn't want to learn.
>... my value network was trained to tell me the game is balanced at the >beginning... :-) The best training policy is to select positions that correct errors. I used the policies below to train a backgammon NN. Together, they reduced the expected loss of the network by 50% (cut the error rate in half): - Select training positions from the program's own games. - Can be self-play or versus an opponent. - Best is to have a broad panel of opponents. - Beneficial to bootstrap with pro games, but then add ONLY training examples from program's own games. - Train only the moves made by the winner of the game - Very important for deterministic games! - Note that the winner can be either your program or the opponent. - If your program wins then training reinforces good behavior; if opponent wins then training corrects bad behavior. - Per game, you should aim to get only a few training examples (3 in backgammon. Maybe 10 in Go?). Use two policies: - Select positions where the static evaluation of a position is significantly different from a deep search - Select positions where the move selected by a deep search did not have the highest static evaluation. (And in this case you have two training positions, which differ by the move chosen.) - Of course, you are selecting examples where you did as badly as possible. - The training value of the position is the result of a deep search. - This is equivalent to "temporal difference learning", but accelerated by the depth of the search. - Periodically refresh the training evaluations as your search/eval improve. These policies actively seek out cases where your evaluation function has some weakness, so training is definitely focused on improving results in the distribution of positions that your program will actually face. You will need about 30 training examples for every free parameter in your NN. You can do the math on how many games that will take. It is inevitable: you will train your NN based on blitz games. Good luck! ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
Finally found the problem. In the end, it was as stupid as expected: When I pick a game for the batch creation I select randomly a limited number of moves inside the game. In the case of the value network I use like 8-16 moves to not overfit the data (I can't take 1 or then the I/O operations slow down the training) and for other networks, I would simply take all the moves. Or at least this was what I thought my code was doing. Instead of picking N random moves in the game, it was picking the first N moves in a random order. So... my value network was trained to tell me the game is balanced at the beginning... Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit : On 19/06/2017 21:31, Vincent Richard wrote: - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). You absolutely need the latter, especially as for outcome prediction the moves from the same game are not independent samples. During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. ... However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. A problem with side to move/won side marking in the input or feature planes, or with the expected outcome (0 vs 1 vs -1)? ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
This is what have been thinking about, yet unable to find an error. Currently, I'm working with: - SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php (until recently I was working with games of all level from KGS) - The data is then analyzed by a script which extracts all kind of features from games. When I'm training a network, I load the features I want from this analysis to build the batch. I have 2 possible methods for the batch construction. I can either add moves one after the other (the fast mode) or pick random moves among different games (slower but reduces the variance). I set the batch size according to my GPU memory (200 moves in the case of full sized value/policy network). I don't think the problem may come from here since the data is the same for all the networks - For the input, I’m using the same architecture as https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of shapes, from minimalist to alphago) - For the network, I’m once again using TheDuck314 network (EvalModels.Conv11PosDepFC1ELU) with the same layers https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and the learning rate he recommends During sime of the tests, all the networks I was training had the same layers except for the last. So as you suggested, I was also wondering if this last layer wasn’t the problem. Yet, I haven’t found any error. Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit : On 19-06-17 17:38, Vincent Richard wrote: During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? 1) What is the training data for the value network? How big is it, how is it presented/shuffled/prepared? 2) What is the *exact* structure of the network and training setup? My best guess would be an error in the construction of the final layers. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Value network that doesn't want to learn.
On 19-06-17 17:38, Vincent Richard wrote: > During my research, I’ve trained a lot of different networks, first on > 9x9 then on 19x19, and as far as I remember all the nets I’ve worked > with learned quickly (especially during the first batches), except the > value net which has always been problematic (diverge easily, doesn't > learn quickly,...) . I have been stuck on the 19x19 value network for a > couple months now. I’ve tried countless of inputs (feature planes) and > lots of different models, even using the exact same code as others. Yet, > whatever I try, the loss value doesn’t move an inch and accuracy stays > at 50% (even after days of training). I've tried to change the learning > rate (increase/decrease), it doesn't change. However, if I feed a stupid > value as target output (for example black always win) it has no trouble > learning. > It is even more frustrating that training any other kind of network > (predicting next move, territory,...) goes smoothly and fast. > > Has anyone experienced a similar problem with value networks or has an > idea of the cause? 1) What is the training data for the value network? How big is it, how is it presented/shuffled/prepared? 2) What is the *exact* structure of the network and training setup? My best guess would be an error in the construction of the final layers. -- GCP ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
[Computer-go] Value network that doesn't want to learn.
Hello everyone, For my master thesis, I have built an AI that has a strategical approach to the game. It doesn’t play but simply describe the strategy behind all possible move for a given strategy ("enclosing this group", "making life for this group", "saving these stones", etc). My main idea is that once associated with a playing AI, I will be able to generate comments on a position (and then teach people). So for my final experiment, I’m trying to build a playing AI. I don’t want it to be highly competitive, I just need it to be decent (1d or so), so I thought about using a policy network, a value network and a simple MCTS. The MCTS works fine, the policy network learns quickly and is accurate, but the value network seems to never learn, even the slightest. During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast. Has anyone experienced a similar problem with value networks or has an idea of the cause? Thank you ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go