Hello everyone,

For my master thesis, I have built an AI that has a strategical approach to the game. It doesn’t play but simply describe the strategy behind all possible move for a given strategy ("enclosing this group", "making life for this group", "saving these stones", etc). My main idea is that once associated with a playing AI, I will be able to generate comments on a position (and then teach people). So for my final experiment, I’m trying to build a playing AI. I don’t want it to be highly competitive, I just need it to be decent (1d or so), so I thought about using a policy network, a value network and a simple MCTS. The MCTS works fine, the policy network learns quickly and is accurate, but the value network seems to never learn, even the slightest.

During my research, I’ve trained a lot of different networks, first on 9x9 then on 19x19, and as far as I remember all the nets I’ve worked with learned quickly (especially during the first batches), except the value net which has always been problematic (diverge easily, doesn't learn quickly,...) . I have been stuck on the 19x19 value network for a couple months now. I’ve tried countless of inputs (feature planes) and lots of different models, even using the exact same code as others. Yet, whatever I try, the loss value doesn’t move an inch and accuracy stays at 50% (even after days of training). I've tried to change the learning rate (increase/decrease), it doesn't change. However, if I feed a stupid value as target output (for example black always win) it has no trouble learning. It is even more frustrating that training any other kind of network (predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has an idea of the cause?

Thank you
Computer-go mailing list

Reply via email to