When I read about Facebook's DCNN-using go program, I remembered another
paper that I'd come across on arxiv, namely "How (not) to train your
generative model: scheduled sampling, likelihood, adversary?" by Ferenc
Huszar (http://arxiv.org/pdf/1511.05101.pdf).

A lot of that paper went over my head (I am a "half-studied scoundrel" as
we say in Norway), but his speculation in the end, I think I sort of got,
and it made a lot of sense to me.

He argues that which side you approach the K-L divergence from so to say
matters for what kind of errors you get when the model, and that when
you're generating as opposed to predicting, the goal should be to minimize
the K-L divergence from the "other" way.

When you're using a DCNN in a go program, you are really doing generating,
not prediction, right? You want to generate a good move. A model that
generates "flashy" moves that LOOK really strong, but could potentially be
very bad, would be a good predictor, but a bad generator.

The ideal probability distribution is the distribution of moves a pro would
make. But to the degree your model falls short, you want to minimize the
chance of making a wildly "un-pro" move, rather than maximizing the chance
of making a "pro" move. Since these are probability distributions, those
two things are not the same unless your model is perfect (right?).

If my understanding is correct (and it's quite possible I'm way off course,
I'm an amateur! sorry for wasting your time if so!), then rather than
training a move predictor, they should use the adversarial methods which
are also in the wind now to train a generative model.

-- Harald Korneliussen
Computer-go mailing list

Reply via email to