Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Brian Sheppard via Computer-go
>... my value network was trained to tell me the game is balanced at the 
>beginning...

:-)

The best training policy is to select positions that correct errors.

I used the policies below to train a backgammon NN. Together, they reduced the 
expected loss of the network by 50% (cut the error rate in half):

- Select training positions from the program's own games.
- Can be self-play or versus an opponent.
- Best is to have a broad panel of opponents.
- Beneficial to bootstrap with pro games, but then add ONLY training 
examples from program's own games.
- Train only the moves made by the winner of the game
- Very important for deterministic games!
- Note that the winner can be either your program or the opponent.
- If your program wins then training reinforces good behavior; if 
opponent wins then training corrects bad behavior.
- Per game, you should aim to get only a few training examples (3 in 
backgammon. Maybe 10 in Go?). Use two policies:
- Select positions where the static evaluation of a position is 
significantly different from a deep search
- Select positions where the move selected by a deep search did not 
have the highest static evaluation. (And in this case you have two training 
positions, which differ by the move chosen.)
- Of course, you are selecting examples where you did as badly as 
possible.
- The training value of the position is the result of a deep search.
- This is equivalent to "temporal difference learning", but accelerated 
by the depth of the search.
- Periodically refresh the training evaluations as your search/eval 
improve.

These policies actively seek out cases where your evaluation function has some 
weakness, so training is definitely focused on improving results in the 
distribution of positions that your program will actually face.

You will need about 30 training examples for every free parameter in your NN. 
You can do the math on how many games that will take. It is inevitable: you 
will train your NN based on blitz games.

Good luck!



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-23 Thread Vincent Richard

Finally found the problem. In the end, it was as stupid as expected:

When I pick a game for the batch creation I select randomly a limited 
number of moves inside the game. In the case of the value network I use 
like 8-16 moves to not overfit the data (I can't take 1 or then the I/O 
operations slow down the training) and for other networks, I would 
simply take all the moves. Or at least this was what I thought my code 
was doing. Instead of picking N random moves in the game, it was picking 
the first N moves in a random order. So... my value network was trained 
to tell me the game is balanced at the beginning...



Le 20-Jun-17 à 5:48 AM, Gian-Carlo Pascutto a écrit :

On 19/06/2017 21:31, Vincent Richard wrote:

- The data is then analyzed by a script which extracts all kind of
features from games. When I'm training a network, I load the features I
want from this analysis to build the batch. I have 2 possible methods
for the batch construction. I can either add moves one after the other
(the fast mode) or pick random moves among different games (slower but
reduces the variance).

You absolutely need the latter, especially as for outcome prediction the
moves from the same game are not independent samples.


During sime of the tests, all the networks I was training had the same
layers except for the last. So as you suggested, I was also wondering if
this last layer wasn’t the problem. Yet, I haven’t found any error.

...

However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.

A problem with side to move/won side marking in the input or feature
planes, or with the expected outcome (0 vs 1 vs -1)?



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

This is what have been thinking about, yet unable to find an error.

Currently, I'm working with:

- SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php 
(until recently I was working with games of all level from KGS)


- The data is then analyzed by a script which extracts all kind of 
features from games. When I'm training a network, I load the features I 
want from this analysis to build the batch. I have 2 possible methods 
for the batch construction. I can either add moves one after the other 
(the fast mode) or pick random moves among different games (slower but 
reduces the variance). I set the batch size according to my GPU memory 
(200 moves in the case of full sized value/policy network). I don't 
think the problem may come from here since the data is the same for all 
the networks


- For the input, I’m using the same architecture as 
https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of 
shapes, from minimalist to alphago)


- For the network, I’m once again using TheDuck314 network 
(EvalModels.Conv11PosDepFC1ELU) with the same layers 
https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and 
the learning rate he recommends


During sime of the tests, all the networks I was training had the same 
layers except for the last. So as you suggested, I was also wondering if 
this last layer wasn’t the problem. Yet, I haven’t found any error.




Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit :

On 19-06-17 17:38, Vincent Richard wrote:


During my research, I’ve trained a lot of different networks, first on
9x9 then on 19x19, and as far as I remember all the nets I’ve worked
with learned quickly (especially during the first batches), except the
value net which has always been problematic (diverge easily, doesn't
learn quickly,...) . I have been stuck on the 19x19 value network for a
couple months now. I’ve tried countless of inputs (feature planes) and
lots of different models, even using the exact same code as others. Yet,
whatever I try, the loss value doesn’t move an inch and accuracy stays
at 50% (even after days of training). I've tried to change the learning
rate (increase/decrease), it doesn't change. However, if I feed a stupid
value as target output (for example black always win) it has no trouble
learning.
It is even more frustrating that training any other kind of network
(predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has an
idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.



___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Gian-Carlo Pascutto
On 19-06-17 17:38, Vincent Richard wrote:

> During my research, I’ve trained a lot of different networks, first on
> 9x9 then on 19x19, and as far as I remember all the nets I’ve worked
> with learned quickly (especially during the first batches), except the
> value net which has always been problematic (diverge easily, doesn't
> learn quickly,...) . I have been stuck on the 19x19 value network for a
> couple months now. I’ve tried countless of inputs (feature planes) and
> lots of different models, even using the exact same code as others. Yet,
> whatever I try, the loss value doesn’t move an inch and accuracy stays
> at 50% (even after days of training). I've tried to change the learning
> rate (increase/decrease), it doesn't change. However, if I feed a stupid
> value as target output (for example black always win) it has no trouble
> learning.
> It is even more frustrating that training any other kind of network
> (predicting next move, territory,...) goes smoothly and fast.
> 
> Has anyone experienced a similar problem with value networks or has an
> idea of the cause?

1) What is the training data for the value network? How big is it, how
is it presented/shuffled/prepared?

2) What is the *exact* structure of the network and training setup?

My best guess would be an error in the construction of the final layers.

-- 
GCP
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Value network that doesn't want to learn.

2017-06-19 Thread Vincent Richard

Hello everyone,

For my master thesis, I have built an AI that has a strategical approach 
to the game. It doesn’t play but simply describe the strategy behind all 
possible move for a given strategy ("enclosing this group", "making life 
for this group", "saving these stones", etc). My main idea is that once 
associated with a playing AI, I will be able to generate comments on a 
position (and then teach people). So for my final experiment, I’m trying 
to build a playing AI. I don’t want it to be highly competitive, I just 
need it to be decent (1d or so), so I thought about using a policy 
network, a value network and a simple MCTS.  The MCTS works fine, the 
policy network learns quickly and is accurate, but the value network 
seems to never learn, even the slightest.


During my research, I’ve trained a lot of different networks, first on 
9x9 then on 19x19, and as far as I remember all the nets I’ve worked 
with learned quickly (especially during the first batches), except the 
value net which has always been problematic (diverge easily, doesn't 
learn quickly,...) . I have been stuck on the 19x19 value network for a 
couple months now. I’ve tried countless of inputs (feature planes) and 
lots of different models, even using the exact same code as others. Yet, 
whatever I try, the loss value doesn’t move an inch and accuracy stays 
at 50% (even after days of training). I've tried to change the learning 
rate (increase/decrease), it doesn't change. However, if I feed a stupid 
value as target output (for example black always win) it has no trouble 
learning.
It is even more frustrating that training any other kind of network 
(predicting next move, territory,...) goes smoothly and fast.


Has anyone experienced a similar problem with value networks or has an 
idea of the cause?


Thank you
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go