Hi,

A method that has been tried out goes something like this:

Step 1: Collect positions:

   - Let the computer play self play, in many games.
   - While playing, at each move, check if the 2-ply move selected move is
   the same as 0-ply selected move.
   - if move_0ply != move_2ply -> store both resulting positions from the
   0-ply move and the 2-ply move in some datastore (typically just a file).
   - continue self play until you think you have collected enough
   positions. (What criteria that should be... boredom maybe?)

Step 2: Rollout.

   - All positions collected in step 1, are then rolled out such that the
   best possible evaluation is found.

Step 3: Supervised training.

   - All positions from the rollouts date above are then used for
   supervised training.


The new trained neural network you now got, is hopefully better than the
one you had before you started this process? (However you MUST verify that
in some way, and it is best if you have a verification method ready before
you even start the training. If not you can verify that you have improved
the network by having the new and the old network play against each other.)

And if you still think your neural network can be further improved, just
start doing this again from Step 1.

OK. Some discussion:
The time consuming steps here are actually step 1 and step 2. Step 3,
supervised training, is pretty fast with modern methods and hardware.
Packages like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or
whatever)  that can utilize GPU and TPU can train neural networks in
minutes (instead of weeks). I already have tools to convert Keras and
PyTorch neural nets to GNU Backgammon neural nets. (and the other way). So
that is good news. However, more good news: the first two steps are highly
distributable. Say we just make a simple tools chain and we start up 10-20
computers (Or maybe Ian has a lot of spare computers ;-), I guess the
modern self play can find 2-3 0-ply 2ply mismatches pr. second (I'm just
guessing?) to collect positions as described in step 1. We (or anyone
volunteering) can start each of our collection processes on the equipment
we got. Then if the same volunteers can rollout the positions with another
tool (in the same toolchain) doing step 2. I then think we can get
something going.

So, please join me in this discussion: Can we organize for such collective
effort? I can share some tools. Joseph? Do you have some input? How many
positions do you think we need? Will anyone join?

Thanks,
-Øystein

On Mon, Oct 19, 2020 at 4:01 PM Aaron Tikuisis <aaron.tikui...@uottawa.ca>
wrote:

> I see, that's very interesting. I'll make sure not to use ctrl-g for
> skewed situations like this!
> So the real problem is that it thinks that gammon chances are near 0 for a
> position like this, when in fact it is 25%:
> ​ GNU Backgammon  Position ID: h+sPAQD3rQEAAA
>                  Match ID   : EAEAAAAAAAAE
>  +12-11-10--9--8--7-------6--5--4--3--2--1-+     O: gnubg
>  |                  |   |    O  O  O  O  O | O   0 points
>  |                  |   |    O     O  O  O | O   On roll
>  |                  |   |             O  O |
>  |                  |   |             O    |
>  |                  |   |             O    |
> ^|                  |BAR|                  |
>  |                7 |   |                  |
>  |                X |   |                  |
>  |                X |   |    X           X |
>  |                X |   |    X           X |
>  |    X           X |   | X  X           X |     0 points
>  +13-14-15-16-17-18------19-20-21-22-23-24-+     X: aaron (Cube: 1)
>
>
> I'm not an expert but I'd think the NN should be able to learn this better
> - why not just try to train it more?
>
> Is gnubg currently able to keep a database of its own 0-ply blunders?
> (Like, every time it does an evaluation, compare the higher-ply result with
> the 0-ply result and if the 0-ply errs by a large enough threshhold, add
> the position to the database.) If not, do you think it would be worth
> implementing this?
>
> Best regards, Aaron
> ------------------------------
> *From:* Øystein Schønning-Johansen <oyste...@gmail.com>
> *Sent:* October 19, 2020 9:26 AM
> *To:* Aaron Tikuisis <aaron.tikui...@uottawa.ca>
> *Cc:* Joseph Heled <jhe...@gmail.com>; Philippe Michel <
> philippe.mich...@free.fr>; bug-gnubg@gnu.org <bug-gnubg@gnu.org>
> *Subject:* Re: The status of gnubg?
>
> *Attention : courriel externe | external email*
> On Mon, Oct 19, 2020 at 3:10 PM Aaron Tikuisis <aaron.tikui...@uottawa.ca>
> wrote:
>
> That is interesting, I did not realize that gnubg misplays race positions
> much. What are some examples?
>
>
>  Here is a position I posted a few weeks ago.
>
> GNU Backgammon  Position ID: 960BAMCw+0MAAA
>                  Match ID   : cAkAAAAAAAAA
>  +13-14-15-16-17-18------19-20-21-22-23-24-+     O: gnubg
>  |                  |   |    O  O  O  O  O | O   0 points
>  |                  |   |    O     O  O  O | O
>  |                  |   |             O  O |
>  |                  |   |             O    |
>  |                  |   |             O    |
> v|                  |BAR|                  |     (Cube: 1)
>  |                7 |   |                  |
>  |                X |   |                  |
>  |                X |   | X                |
>  |                X |   | X  X           X |     On roll
>  |    X           X |   | X  X           X |     0 points
>  +12-11-10--9--8--7-------6--5--4--3--2--1-+     X: oystein
>
> Money game and X to play. Try several rolls, like 52, 31 and 53 and... at
> 0-ply. What's the best move? 52: 6/1 6/4?
> Of course, the evaluator reports 0.0 win, but since the gammons are
> incorrectly evaluated by the neural network, it makes ridiculous moves.
> It looks like this is a common pattern in positions which are "skewed".
>
> -Øystein
>
>

Reply via email to