On Tue, 20 Oct 2020 at 06:36, Øystein Schønning-Johansen <[email protected]>
wrote:

> Hi,
>
> A method that has been tried out goes something like this:
>
> Step 1: Collect positions:
>
>    - Let the computer play self play, in many games.
>    - While playing, at each move, check if the 2-ply move
>    selected move is the same as 0-ply selected move.
>    - if move_0ply != move_2ply -> store both resulting positions from the
>    0-ply move and the 2-ply move in some datastore (typically just a file).
>    - continue self play until you think you have collected enough
>    positions. (What criteria that should be... boredom maybe?)
>
> Step 2: Rollout.
>
>    - All positions collected in step 1, are then rolled out such that the
>    best possible evaluation is found.
>
> Step 3: Supervised training.
>
>    - All positions from the rollouts date above are then used for
>    supervised training.
>
>
> The new trained neural network you now got, is hopefully better than the
> one you had before you started this process? (However you MUST verify that
> in some way, and it is best if you have a verification method ready before
> you even start the training. If not you can verify that you have improved
> the network by having the new and the old network play against each other.)
>
> And if you still think your neural network can be further improved, just
> start doing this again from Step 1.
>
> OK. Some discussion:
> The time consuming steps here are actually step 1 and step 2. Step 3,
> supervised training, is pretty fast with modern methods and hardware.
> Packages like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or
> whatever)  that can utilize GPU and TPU can train neural networks in
> minutes (instead of weeks). I already have tools to convert Keras and
> PyTorch neural nets to GNU Backgammon neural nets. (and the other way). So
> that is good news. However, more good news: the first two steps are highly
> distributable. Say we just make a simple tools chain and we start up 10-20
> computers (Or maybe Ian has a lot of spare computers ;-), I guess the
> modern self play can find 2-3 0-ply 2ply mismatches pr. second (I'm just
> guessing?) to collect positions as described in step 1. We (or anyone
> volunteering) can start each of our collection processes on the equipment
> we got. Then if the same volunteers can rollout the positions with another
> tool (in the same toolchain) doing step 2. I then think we can get
> something going.
>
> So, please join me in this discussion: Can we organize for such collective
> effort? I can share some tools. Joseph? Do you have some input? How many
> positions do you think we need? Will anyone join?
>

I can comment on that: my experience from 20 years ago was that at some
stage adding positions started to hurt the net performance. It is always a
balancing act between getting the common/regular positions right and
getting the edge cases right. I think that whatever you do you might want
to start fresh and see how my "method" (as you outlined above) can be
improved.

-Joseph



>
> Thanks,
> -Øystein
>
> On Mon, Oct 19, 2020 at 4:01 PM Aaron Tikuisis <[email protected]>
> wrote:
>
>> I see, that's very interesting. I'll make sure not to use ctrl-g for
>> skewed situations like this!
>> So the real problem is that it thinks that gammon chances are near 0 for
>> a position like this, when in fact it is 25%:
>>  GNU Backgammon  Position ID: h+sPAQD3rQEAAA
>>                  Match ID   : EAEAAAAAAAAE
>>  +12-11-10--9--8--7-------6--5--4--3--2--1-+     O: gnubg
>>  |                  |   |    O  O  O  O  O | O   0 points
>>  |                  |   |    O     O  O  O | O   On roll
>>  |                  |   |             O  O |
>>  |                  |   |             O    |
>>  |                  |   |             O    |
>> ^|                  |BAR|                  |
>>  |                7 |   |                  |
>>  |                X |   |                  |
>>  |                X |   |    X           X |
>>  |                X |   |    X           X |
>>  |    X           X |   | X  X           X |     0 points
>>  +13-14-15-16-17-18------19-20-21-22-23-24-+     X: aaron (Cube: 1)
>>
>>
>> I'm not an expert but I'd think the NN should be able to learn this
>> better - why not just try to train it more?
>>
>> Is gnubg currently able to keep a database of its own 0-ply blunders?
>> (Like, every time it does an evaluation, compare the higher-ply result with
>> the 0-ply result and if the 0-ply errs by a large enough threshhold, add
>> the position to the database.) If not, do you think it would be worth
>> implementing this?
>>
>> Best regards, Aaron
>> ------------------------------
>> *From:* Øystein Schønning-Johansen <[email protected]>
>> *Sent:* October 19, 2020 9:26 AM
>> *To:* Aaron Tikuisis <[email protected]>
>> *Cc:* Joseph Heled <[email protected]>; Philippe Michel <
>> [email protected]>; [email protected] <[email protected]>
>> *Subject:* Re: The status of gnubg?
>>
>> *Attention : courriel externe | external email*
>> On Mon, Oct 19, 2020 at 3:10 PM Aaron Tikuisis <[email protected]>
>> wrote:
>>
>> That is interesting, I did not realize that gnubg misplays race positions
>> much. What are some examples?
>>
>>
>>  Here is a position I posted a few weeks ago.
>>
>> GNU Backgammon  Position ID: 960BAMCw+0MAAA
>>                  Match ID   : cAkAAAAAAAAA
>>  +13-14-15-16-17-18------19-20-21-22-23-24-+     O: gnubg
>>  |                  |   |    O  O  O  O  O | O   0 points
>>  |                  |   |    O     O  O  O | O
>>  |                  |   |             O  O |
>>  |                  |   |             O    |
>>  |                  |   |             O    |
>> v|                  |BAR|                  |     (Cube: 1)
>>  |                7 |   |                  |
>>  |                X |   |                  |
>>  |                X |   | X                |
>>  |                X |   | X  X           X |     On roll
>>  |    X           X |   | X  X           X |     0 points
>>  +12-11-10--9--8--7-------6--5--4--3--2--1-+     X: oystein
>>
>> Money game and X to play. Try several rolls, like 52, 31 and 53 and... at
>> 0-ply. What's the best move? 52: 6/1 6/4?
>> Of course, the evaluator reports 0.0 win, but since the gammons are
>> incorrectly evaluated by the neural network, it makes ridiculous moves.
>> It looks like this is a common pattern in positions which are "skewed".
>>
>> -Øystein
>>
>>

Reply via email to