On Tue, 20 Oct 2020 at 18:02, Isaac Keslassy <[email protected]> wrote:
> Hi, > > It would be great to renew the effort on gnubg! > > I have a question regarding the fundamental NN weight improvement > technique. If I understand correctly, to improve the NN weights, you are > trying the supervised-learning approach of picking tough positions, > determining the best move using rollouts, then gradually optimizing the > NN weights. However, as Joseph mentioned, this may affect the NN play in > positions arising in regular games. > > However, there are other techniques that have proved more efficient at > games like chess. They avoid the long rollouts and work on positions of > regular games. For instance: > > 1. SPSA: This is an obvious approach. Let the NN play against a very > slightly modified version of it, pick the winner, and using a random > walk, gradually converge to better parameters; or: > This will require a lot of cycles. Determining which of two closely related nets is better requires a large number of games. If you go that way, a good set of reference positions (obtained, as mentioned, from rollouts) would probably work better. Like all approaches this will need to be iterated (i.e. when you get a better player you want to re-roll the reference positions and repeat) > > 2. Logistic regression: Instead of teaching the best move, teach the > position equity (as also mentioned by Aaron). We are training the net to compute the equity. The discussion was an attempt to explain how positions are added to the training data. I recently trained nets for two other games with a similar method and this approach (of incrementally adding mis-played position) was again the best way of progressively getting a better player. I also had to start fresh a couple of times, each time having a slightly stronger base player. The big difference to gnubg was that I trained on 1-ply, not 2-ply. This seemed to eliminate some of the ply effect we see in gnubg and possibly other nets for other games. Specifically, we could try > to minimize the equity error associated to each position. Assume DMP for > simplicity. Run a million games through self-play, and associate all the > obtained positions to the final game result (-1 for loss, +1 for win). > Then tune all the NN weights through gradient descent to minimize the > difference between the position estimate and the final game result. > > (see https://www.chessprogramming.org/Automated_Tuning, Texel's tuning, > SPSA etc. for more details) > > Has anybody tried such alternative methods? > > Thanks, > Isaac > >
