Re: [ccp4bb]: maximum likelihood question

Douglas Theobald Tue, 06 Sep 2005 11:24:37 -0700

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***



On 9/6/05 5:00 AM, "George M. Sheldrick" <[EMAIL PROTECTED]> a
écrit :

> Douglas
> 
> It seems that this example highlights an important difference
> between ML and LS. Although the absolute configuration
> parameter is primarily of interest to small-molecule
> crystallographers, the same arguments apply to twin fractions,
> occupancies and B-values that should be of interest to protein
> crystallographers too.
> 
> As you point out, Q(hkl) (see below) will have a much more
> Gaussian distribution than I(hkl). At first sight, if Q(hkl)
> rally has a Gaussian error distribution, ML and LS refinement
> should give the same values for x and its esd. This is however
> wrong, because by integrating from x=0 to x=1 we have
> constrained the ML refinement by our prior knowledge that only
> x values in this range are physically possible; this contraint
> cannot be applied in the LS framework.

It is unclear to me why you can't choose to impose this
restraint (0 =< x =< 1) with LS too.

> So if I determine the Flack x (absolute configuration)
> parameter from 10 different crystals of sucrose by LS, I will
> get values scattered around 0.0 with (I hope) a scatter
> corresponding to the estimated esd. If I do the same by ML,
> because of the integration between x=0 and x=1, ALL
> experiments will give x values slightly greater than zero!
> 
> Through Bayesian eyes the ML result makes perfect sense,
> despite the fact that the true value of x is 0.0. However I
> could never persuade my small-molecule colleagues to accept it.

All this means is that the ML estimate is biased, and that only
should matter to you if you are a hard-core frequentist.
Uncritical adherence to the frequentist philosophy leads to many
infamous logical contradictions (here I'm showing my stripes).
It is well-known that ML estimates are commonly biased.
Nevertheless, ML estimates are usually more *accurate*.

For instance, most people are familiar with this formula for
calculating a variance (the square of the standard deviation):

1/N \Sum [x - ave(x)]^2

and the "sample" version with N-1 in the denominator should also
be familiar:

1/(N-1) \Sum [x - ave(x)]^2

The first version, which is often denigrated as being less
correct, is in fact the ML estimate of the width parameter of a
Gaussian distribution, and it is biased. That is, if you repeat
the calculation many times from different samples of the same
data (like your example of 10 different sucrose xtals), the
average variance will be a little larger than the true value.
The second version for calculating the variance, in contrast, is
unbiased. In the long run it converges on the true value, on
average.

However, it is less well-known that the ML estimate is more
*accurate*, even though it is biased. On average, the mean
squared error of the ML estimate is smaller than the MSE of the
unbiased version. So, while the ML estimate is usually a bit too
large, it is also usually closer to the truth.

With more complicated statistical problems, the difference can
be very substantial. Sometimes the unbiased estimator can be
wildly inaccurate, and a common situation is that to keep
unbiasedness, you have to allow physically impossible values
(like a Flack parameter less than 0).

The following cartoon analogy may help clarify the situation:
imagine throwing darts at a target, where the 'true value'
corresponds to the bull's eye, and different dart throws
correspond to our individual estimates of the true value. The
person throwing the darts corresponds to a particular
statistical estimator. Just like dart throwers, some estimators
hit the target better than others.

Lets define the 'accuracy' of the thrower (or the statistical
estimator) as how close a throw is to the bull's eye on average.
Lets say we make 100 throws at the target, and we get a spread
of hits.  The 'precision' of the thrower is how spread out or
dispersed the 100 hits are over the target. If they all cluster
closely, then the variance is low and the precision is high,
which is a good thing. The 'bias' is equivalent to the center of
the cluster of throws. If the center corresponds with the bull's
eye, the throws are unbiased. If the center of the 100 throws is
a bit to the side of the bull's eye, then the thrower is a bit
biased. Now what matters more, bias or precision? They both go
into accuracy. In fact, in statistics the *inaccuracy* is simply
the sum of the variance and the squared bias. (More exactly,
mean squared error = variance + bias^2). They are both equally
important. You can be a perfectly unbiased thrower, yet have
throws hitting all over the place. If you hit exactly five
meters to the left of the target half the time, and five meters
to the right the other half, then you are an unbiased thrower.
But your precision is horrible, and your individual throws have
never even come close to the actual target. Conversely, you can
have perfect precision, with all 100 throws on top of each
other, but 5 centimeters away from the target. What would you
prefer, the unbiased thrower, or the moderately biased one with
good precision?

Another problem with caring too much about 'bias' is that it is
not invariant to arbitrary parameter transformations. For
example, the square root of the variance is the standard
deviation. The second equation above is an unbiased estimator of
the true variance, but it is a biased estimator of the true
standard deviation. So if you want an unbiased estimator of the
SD, you have to figure out some other equation. In contrast, the
ML estimates are invariant under parameter transformations --
for example, the ML estimate of the standard deviation is simply
the square root of the ML estimate of the variance. The
absurdity of bias should be clear -- if you have a good
estimator for the radius of a perfect sphere, logically it
should be just as good as an estimate of the volume of the
sphere, since there is a one-to-one correspondence between the
radius and the volume (V=4/3 \pi r^3). Yet an unbiased estimate
of the radius will be a very biased estimate of the volume
(and vice versa).

This is why I commented in an earlier post that I don't care
about bias too much. FWIW, it's good to have if you can, but I
would rather not sacrifice accuracy, logic, or reality to have
it.

> The combined might of the Acta C editors and the CIF people
> has never forgiven me for deciding, when I released the
> program in 1993, that SHELXL would never print physically
> impossible results (so negative occupancies and B values were
> truncated to zero as well as imposing 0=<x=<1). They would
> argue that x is the result of an experiment, so negative
> values (within a couple of esds) are not only possible, they
> must occur! And their final twist is that SHELXL refines
> against Iobs rather than Fobs specifically so that negative
> experimental values can be used directly without fudging.

Ack.

> George
> 
> PS. Thanks for spotting the accidentally missing (1-x) in my
> original email, I hope that this did not confuse too many
> readers.
> 
>> 
>> Depending on the model assumed, this LS method can be
>> interpreted as ML. For instance, let's assume that the 'real'
>> intensity I(hkl) is subject to random Gaussian error. Then,
>> one model is
>> 
>> Iobs(hkl) = (1-x)I(hkl) + x.I(-h-k-l) + e.sigmaI(hkl)
>> 
>> where e is a standard Gaussian random variable (with variance
>> 1) and sigmaI(hkl) is the std deviation for each hkl. Here, x
>> is a variable dependent on the current molecular model.
>> 
>> Then, a LS fit of Iobs(hkl) to (1-x)I(hkl) + xI(-h-k-l)
>> weighted by the inverse of sigmaI(hkl) is equivalent to the
>> ML solution given by the above model. In this case there is a
>> relatively simple analytical solution for x, if we are
>> estimating I(hkl), I(-h-k-l), and sigmaI(hkl) by Iobs(hkl),
>> Iobs(-h-k-l), and sigmaIobs(hkl) respectively.
>> 
>> The LL(x) is given by
>> 
>> -0.5[(Iobs(hkl) - (1-x)I(hkl) - x.I(-h-k-l))/sigmaI(hkl)]^2
>> 
>> up to an arbitrary constant.
>> 
>> Note that this model has at least one theoretical problem, in
>> that real, physically measurable intensities are always
>> positive, and this model will give negative Iobs with high
>> probability when sigmaI is large enough. IOW, intensities
>> cannot be distributed by a Gaussian, except approximately.
>>> 
>>> The following 'ML' approach was proposed in discussions at
>>> the Computing School in Siena (contributions from Rob Hooft,
>>> Simon Parsons and David Watkin are acknowledged, but I'm
>>> sure that there were others). We define the quantity Qobs =
>>> [Iobs(hkl)-Iobs(-h-k-l)]/[Iobs(hkl)+Iobs(-h-k-l)] and a
>>> corresponding Qcalc. This should cancel some systematic
>>> errors. The esd of Qobs can be calculated by standard
>>> methods from the known esds of Iobs(hkl) and Iobs(-h-k-l).
>>> Then we estimate the log likelihood (LL) of a particular
>>> value of x by -0.5[(Qobs-Qcalc)/esd(Qobs)]^2 summed over all
>>> reflections.
>> 
>> This model is both LS and ML in the same way as the above
>> model. The difference is that the implied model is a little
>> different: here you assume that there is a 'real' Q(hkl), and
>> that Qobs(hkl) is subject to random Gaussian errors as
>> 
>> Qobs(hkl) = Q(hkl) + e.sigmaQ(hkl)
>> 
>>> So my question is: should I estimate the value of x and its
>>> esd by integrating over -infinity < x < +infinity (in
>>> practice this range can be truncated to say -5 to +5) or
>>> should I integrate over 0 =< x =< 1 or should I use only the
>>> values of the LL at x = 0 and x = 1?
>> 
>> My opinion is to use 0 =< x =< 1 if you expect a reasonable
>> possibility of racemic twinning; otherwise, constrain x to 0
>> and 1.
>> 
>>> Preliminary tests suggest that these methods give
>>> appreciably lower esds for >>> x than the LS approach.
>> 
>> As I said, the 'LS' approach above (the first model
>> discussed) can also be thought of as ML. So the only real
>> difference is in the assumed model, whether random Gaussian
>> errors are applied to I(hkl) vs Q(hkl). This immediately begs
>> the question as to which model is more appropriate. For one
>> thing, it appears to me that Qobs(hkl) can take negative
>> values in reality, so Gaussian errors are probably more
>> justifiable for this model, and this may be the main reason
>> it performs better.
>>

Re: [ccp4bb]: maximum likelihood question

Reply via email to