Re: [gnubg] Temporal difference learning. Lambda parameter.

Timothy Y. Chow Sun, 22 Dec 2019 14:00:04 -0800

Philippe Michel wrote:

The engine doesn't "plan ahead", does it ? It approximates theprobabilities of the game outcomes from the current position (or we cansay its equity for simplification).
My understanding is that its potential accuracy depends on the neuralnetwork (architecture + input features) and the training method(including the training database in the case of supervised learning) hasinfluence on how close to this potential one can go, and how fast.

I haven't done any actual training of backgammon nets, but I think whatOysetein was saying is that TD learning is a method of trying to figureout (crudely speaking) "where you made your mistake when you lost," and itworks well when you don't have to "backtrack too far" when you'rereadjusting your weights. But for positions where there's "long-termplanning" (e.g., rolling the prime around the board), one intuitivelyexpects TD learning not to work so well.

It's true that once you have a reasonably good network, you can"fine-tune" it using other methods. For example, for a perfect bot,0-ply, 1-ply, 2-ply, etc., should all give the same answer, but an actualbot won't, so you can get some improvement just by forcing the bot to ironout these inconsistencies. This can be done using various supervisedtraining methods and not necessarily TD learning. But my understanding(which could be flawed) is that TD learning still enters the picture atthe very first step, when you're starting from scratch (with only therules and no heuristics).

If there's some area of the game where your network is still doing verypoorly, then you may need to do more "from scratch" training, rather thanjust bootstrapping off what you already have. I think this is why Oysteinis suggesting revisiting TD learning.

Tim

Re: [gnubg] Temporal difference learning. Lambda parameter.

Reply via email to