Philippe Michel wrote:
The engine doesn't "plan ahead", does it ? It approximates the
probabilities of the game outcomes from the current position (or we can
say its equity for simplification).
My understanding is that its potential accuracy depends on the neural
network (architecture + input features) and the training method
(including the training database in the case of supervised learning) has
influence on how close to this potential one can go, and how fast.
I haven't done any actual training of backgammon nets, but I think what
Oysetein was saying is that TD learning is a method of trying to figure
out (crudely speaking) "where you made your mistake when you lost," and it
works well when you don't have to "backtrack too far" when you're
readjusting your weights. But for positions where there's "long-term
planning" (e.g., rolling the prime around the board), one intuitively
expects TD learning not to work so well.
It's true that once you have a reasonably good network, you can
"fine-tune" it using other methods. For example, for a perfect bot,
0-ply, 1-ply, 2-ply, etc., should all give the same answer, but an actual
bot won't, so you can get some improvement just by forcing the bot to iron
out these inconsistencies. This can be done using various supervised
training methods and not necessarily TD learning. But my understanding
(which could be flawed) is that TD learning still enters the picture at
the very first step, when you're starting from scratch (with only the
rules and no heuristics).
If there's some area of the game where your network is still doing very
poorly, then you may need to do more "from scratch" training, rather than
just bootstrapping off what you already have. I think this is why Oystein
is suggesting revisiting TD learning.
Tim