On Tue, 20 Oct 2020 at 06:36, Øystein Schønning-Johansen <[email protected]> wrote:
> Hi, > > A method that has been tried out goes something like this: > > Step 1: Collect positions: > > - Let the computer play self play, in many games. > - While playing, at each move, check if the 2-ply move > selected move is the same as 0-ply selected move. > - if move_0ply != move_2ply -> store both resulting positions from the > 0-ply move and the 2-ply move in some datastore (typically just a file). > - continue self play until you think you have collected enough > positions. (What criteria that should be... boredom maybe?) > > Step 2: Rollout. > > - All positions collected in step 1, are then rolled out such that the > best possible evaluation is found. > > Step 3: Supervised training. > > - All positions from the rollouts date above are then used for > supervised training. > > > The new trained neural network you now got, is hopefully better than the > one you had before you started this process? (However you MUST verify that > in some way, and it is best if you have a verification method ready before > you even start the training. If not you can verify that you have improved > the network by having the new and the old network play against each other.) > > And if you still think your neural network can be further improved, just > start doing this again from Step 1. > > OK. Some discussion: > The time consuming steps here are actually step 1 and step 2. Step 3, > supervised training, is pretty fast with modern methods and hardware. > Packages like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or > whatever) that can utilize GPU and TPU can train neural networks in > minutes (instead of weeks). I already have tools to convert Keras and > PyTorch neural nets to GNU Backgammon neural nets. (and the other way). So > that is good news. However, more good news: the first two steps are highly > distributable. Say we just make a simple tools chain and we start up 10-20 > computers (Or maybe Ian has a lot of spare computers ;-), I guess the > modern self play can find 2-3 0-ply 2ply mismatches pr. second (I'm just > guessing?) to collect positions as described in step 1. We (or anyone > volunteering) can start each of our collection processes on the equipment > we got. Then if the same volunteers can rollout the positions with another > tool (in the same toolchain) doing step 2. I then think we can get > something going. > > So, please join me in this discussion: Can we organize for such collective > effort? I can share some tools. Joseph? Do you have some input? How many > positions do you think we need? Will anyone join? > I can comment on that: my experience from 20 years ago was that at some stage adding positions started to hurt the net performance. It is always a balancing act between getting the common/regular positions right and getting the edge cases right. I think that whatever you do you might want to start fresh and see how my "method" (as you outlined above) can be improved. -Joseph > > Thanks, > -Øystein > > On Mon, Oct 19, 2020 at 4:01 PM Aaron Tikuisis <[email protected]> > wrote: > >> I see, that's very interesting. I'll make sure not to use ctrl-g for >> skewed situations like this! >> So the real problem is that it thinks that gammon chances are near 0 for >> a position like this, when in fact it is 25%: >> GNU Backgammon Position ID: h+sPAQD3rQEAAA >> Match ID : EAEAAAAAAAAE >> +12-11-10--9--8--7-------6--5--4--3--2--1-+ O: gnubg >> | | | O O O O O | O 0 points >> | | | O O O O | O On roll >> | | | O O | >> | | | O | >> | | | O | >> ^| |BAR| | >> | 7 | | | >> | X | | | >> | X | | X X | >> | X | | X X | >> | X X | | X X X | 0 points >> +13-14-15-16-17-18------19-20-21-22-23-24-+ X: aaron (Cube: 1) >> >> >> I'm not an expert but I'd think the NN should be able to learn this >> better - why not just try to train it more? >> >> Is gnubg currently able to keep a database of its own 0-ply blunders? >> (Like, every time it does an evaluation, compare the higher-ply result with >> the 0-ply result and if the 0-ply errs by a large enough threshhold, add >> the position to the database.) If not, do you think it would be worth >> implementing this? >> >> Best regards, Aaron >> ------------------------------ >> *From:* Øystein Schønning-Johansen <[email protected]> >> *Sent:* October 19, 2020 9:26 AM >> *To:* Aaron Tikuisis <[email protected]> >> *Cc:* Joseph Heled <[email protected]>; Philippe Michel < >> [email protected]>; [email protected] <[email protected]> >> *Subject:* Re: The status of gnubg? >> >> *Attention : courriel externe | external email* >> On Mon, Oct 19, 2020 at 3:10 PM Aaron Tikuisis <[email protected]> >> wrote: >> >> That is interesting, I did not realize that gnubg misplays race positions >> much. What are some examples? >> >> >> Here is a position I posted a few weeks ago. >> >> GNU Backgammon Position ID: 960BAMCw+0MAAA >> Match ID : cAkAAAAAAAAA >> +13-14-15-16-17-18------19-20-21-22-23-24-+ O: gnubg >> | | | O O O O O | O 0 points >> | | | O O O O | O >> | | | O O | >> | | | O | >> | | | O | >> v| |BAR| | (Cube: 1) >> | 7 | | | >> | X | | | >> | X | | X | >> | X | | X X X | On roll >> | X X | | X X X | 0 points >> +12-11-10--9--8--7-------6--5--4--3--2--1-+ X: oystein >> >> Money game and X to play. Try several rolls, like 52, 31 and 53 and... at >> 0-ply. What's the best move? 52: 6/1 6/4? >> Of course, the evaluator reports 0.0 win, but since the gammons are >> incorrectly evaluated by the neural network, it makes ridiculous moves. >> It looks like this is a common pattern in positions which are "skewed". >> >> -Øystein >> >>
