My computer isn't fast (gnubg calibrates it at about 110k static 
evaluations/sec), but I'd be happy to let it make what small contribution it 
can by running rollouts etc. overnight.

In step 3, we train with old positions as well as new ones, right? I would 
imagine that if we only use positions where the current NN underperforms, this 
could very easily skew the training.

How important is it to use rollouts in step 2, instead of say 4-ply? In my 
experience, it doesn't seem that there's a big difference, especially in 
non-contact positions.

In step 1, should we just be looking at the decision, or also at the lost 
equity? For example, 0-ply may sometimes pick the wrong move, but it's only 
wrong by .01. On the flipside, 0-ply might consider two moves to be nearly 
equal when in fact it's second-place move is a big blunder.
Also regarding step 1, if we're only working on the race net for now, I suppose 
we should not even get gnubg to evaluate positions on 2-ply until contact is 
broken.

For step 4 (evaluating whether an improvement has actually been made), would a 
good method be to simply take a random sampling of positions (NOT from the 
training set) as a benchmark?

I am also curious to know how many positions are needed.

Best regards,
Aaron
________________________________
From: Joseph Heled <[email protected]>
Sent: October 19, 2020 1:52 PM
To: Øystein Schønning-Johansen <[email protected]>
Cc: Aaron Tikuisis <[email protected]>; Philippe Michel 
<[email protected]>; [email protected] <[email protected]>
Subject: Re: The status of gnubg?

Attention : courriel externe | external email


On Tue, 20 Oct 2020 at 06:36, Øystein Schønning-Johansen 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

A method that has been tried out goes something like this:

Step 1: Collect positions:

  *   Let the computer play self play, in many games.
  *   While playing, at each move, check if the 2-ply move selected move is the 
same as 0-ply selected move.
  *   if move_0ply != move_2ply -> store both resulting positions from the 
0-ply move and the 2-ply move in some datastore (typically just a file).
  *   continue self play until you think you have collected enough positions. 
(What criteria that should be... boredom maybe?)

Step 2: Rollout.

  *   All positions collected in step 1, are then rolled out such that the best 
possible evaluation is found.

Step 3: Supervised training.

  *   All positions from the rollouts date above are then used for supervised 
training.

The new trained neural network you now got, is hopefully better than the one 
you had before you started this process? (However you MUST verify that in some 
way, and it is best if you have a verification method ready before you even 
start the training. If not you can verify that you have improved the network by 
having the new and the old network play against each other.)

And if you still think your neural network can be further improved, just start 
doing this again from Step 1.

OK. Some discussion:
The time consuming steps here are actually step 1 and step 2. Step 3, 
supervised training, is pretty fast with modern methods and hardware. Packages 
like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or whatever)  that 
can utilize GPU and TPU can train neural networks in minutes (instead of 
weeks). I already have tools to convert Keras and PyTorch neural nets to GNU 
Backgammon neural nets. (and the other way). So that is good news. However, 
more good news: the first two steps are highly distributable. Say we just make 
a simple tools chain and we start up 10-20 computers (Or maybe Ian has a lot of 
spare computers ;-), I guess the modern self play can find 2-3 0-ply 2ply 
mismatches pr. second (I'm just guessing?) to collect positions as described in 
step 1. We (or anyone volunteering) can start each of our collection processes 
on the equipment we got. Then if the same volunteers can rollout the positions 
with another tool (in the same toolchain) doing step 2. I then think we can get 
something going.

So, please join me in this discussion: Can we organize for such collective 
effort? I can share some tools. Joseph? Do you have some input? How many 
positions do you think we need? Will anyone join?

I can comment on that: my experience from 20 years ago was that at some stage 
adding positions started to hurt the net performance. It is always a balancing 
act between getting the common/regular positions right and getting the edge 
cases right. I think that whatever you do you might want to start fresh and see 
how my "method" (as you outlined above) can be improved.

-Joseph



Thanks,
-Øystein

On Mon, Oct 19, 2020 at 4:01 PM Aaron Tikuisis 
<[email protected]<mailto:[email protected]>> wrote:
I see, that's very interesting. I'll make sure not to use ctrl-g for skewed 
situations like this!
So the real problem is that it thinks that gammon chances are near 0 for a 
position like this, when in fact it is 25%:
 GNU Backgammon  Position ID: h+sPAQD3rQEAAA
                 Match ID   : EAEAAAAAAAAE
 +12-11-10--9--8--7-------6--5--4--3--2--1-+     O: gnubg
 |                  |   |    O  O  O  O  O | O   0 points
 |                  |   |    O     O  O  O | O   On roll
 |                  |   |             O  O |
 |                  |   |             O    |
 |                  |   |             O    |
^|                  |BAR|                  |
 |                7 |   |                  |
 |                X |   |                  |
 |                X |   |    X           X |
 |                X |   |    X           X |
 |    X           X |   | X  X           X |     0 points
 +13-14-15-16-17-18------19-20-21-22-23-24-+     X: aaron (Cube: 1)


I'm not an expert but I'd think the NN should be able to learn this better - 
why not just try to train it more?

Is gnubg currently able to keep a database of its own 0-ply blunders? (Like, 
every time it does an evaluation, compare the higher-ply result with the 0-ply 
result and if the 0-ply errs by a large enough threshhold, add the position to 
the database.) If not, do you think it would be worth implementing this?

Best regards, Aaron
________________________________
From: Øystein Schønning-Johansen <[email protected]<mailto:[email protected]>>
Sent: October 19, 2020 9:26 AM
To: Aaron Tikuisis <[email protected]<mailto:[email protected]>>
Cc: Joseph Heled <[email protected]<mailto:[email protected]>>; Philippe Michel 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: The status of gnubg?

Attention : courriel externe | external email
On Mon, Oct 19, 2020 at 3:10 PM Aaron Tikuisis 
<[email protected]<mailto:[email protected]>> wrote:
That is interesting, I did not realize that gnubg misplays race positions much. 
What are some examples?

 Here is a position I posted a few weeks ago.

GNU Backgammon  Position ID: 960BAMCw+0MAAA
                 Match ID   : cAkAAAAAAAAA
 +13-14-15-16-17-18------19-20-21-22-23-24-+     O: gnubg
 |                  |   |    O  O  O  O  O | O   0 points
 |                  |   |    O     O  O  O | O
 |                  |   |             O  O |
 |                  |   |             O    |
 |                  |   |             O    |
v|                  |BAR|                  |     (Cube: 1)
 |                7 |   |                  |
 |                X |   |                  |
 |                X |   | X                |
 |                X |   | X  X           X |     On roll
 |    X           X |   | X  X           X |     0 points
 +12-11-10--9--8--7-------6--5--4--3--2--1-+     X: oystein

Money game and X to play. Try several rolls, like 52, 31 and 53 and... at 
0-ply. What's the best move? 52: 6/1 6/4?
Of course, the evaluator reports 0.0 win, but since the gammons are incorrectly 
evaluated by the neural network, it makes ridiculous moves.
It looks like this is a common pattern in positions which are "skewed".

-Øystein

Reply via email to