I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to