On Dec 12, 2007 3:31 PM, Jason House <[EMAIL PROTECTED]> wrote:

>
>
> On Dec 12, 2007 3:09 PM, Álvaro Begué <[EMAIL PROTECTED]> wrote:
>
> >
> >
> > On Dec 12, 2007 3:05 PM, Jason House <[EMAIL PROTECTED]>
> > wrote:
> >
> > >
> > >
> > > On Dec 12, 2007 2:59 PM, Rémi Coulom <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > > Do you mean a plot of the prediction rate with only the
> > > > > gamma of interest varying?
> > > >
> > > > No the prediction rate, but the probability of the training data.
> > > > More
> > > > precisely, the logarithm of that probability.
> > >
> > >
> > > I still don't know what you mean by this.
> > >
> >
> > He probably should use the word "likelihood" instead of "probability".
> > http://en.wikipedia.org/wiki/Likelihood_function
> >
>
> Clearly I'm missing something, because I still don't understand.  Let's
> take a simple example of a move is on the 3rd line and has a gamma value of
> 1.75.  What is the equation or sequence of discrete values that I can take
> the derivative of?
>

We start with a database of games, and we are trying to find a set of gamma
values. For a given set of gamma values, we can compute the probability of
all the moves happening exactly as they happened in the database. So if the
first move is E4 and we had E4 as having a probability of 0.005, we start
with that, then we take the next move, and multiply 0.005 by the probability
of the second move, etc. By the end of the database, we'll have some number
like 3.523E-9308 which is the probability of all of the moves in the
database happening. This is the probability of the database if it had been
generated by a random process following the probability distributions
modeled by the set gamma values. You can see this as a function of the gamma
values. This function is usually called "likelihood function". In order to
pick the best gammas, we choose the ones with the maximum likelihood.
Sometimes we use the logarithm of the likelihood instead, which has the
interpretation of being "minus the amount of information in the database",
plus it's not a number with gazillion 0s after the decimal point.

Now, around the point where the maximum likelihood happens, you can try to
move one of the gammas and see how much it hurts the likelihood. For some
features it will hurt a lot, which means that the value has to be very close
to the one you computed, or you'll get a bad model, and for some features it
will hurt very little, which means that there are other settings of the
value that are sort of equivalent. The second derivative of the likelihood
(or the log of the likelihood, I don't think it should matter much), will
tell you how narrow a peak you are at.

Does that make some sense?
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to