Re: [ccp4bb] ctruncate bug?

James Holton Sat, 06 Jul 2013 18:33:14 -0700

The dominant source of error in an intensity measurement actuallydepends on the magnitude of the intensity. For intensities near zeroand with zero background, the "read-out noise" of image plate orCCD-based detectors becomes important. On most modern CCD detectors,however, the read-out noise is quite low: equivalent to the noiseinduced by having only a few "extra" photons/pixel (if any). Forintensities of more than ~1000 photons, the calibration of the detector(~2-3% error) starts to dominate. It is only for a "midrange" between~2 photons/pixel and 1000 integrated photons that "shot noise" (aka"photon counting error" or "Poisson statistics") plays the major role.So it is perhaps a bit ironic that the "photon counting error" we worryso much about is only significant for a very narrow range of intensitiesin any given data set.

But yes, there does seem to be something "wrong" with ctruncate. It canthrow out a great deal of hkls that both xdsconv and the "old truncate"keep. Graph of the resulting Wilson plots here:

http://bl831.als.lbl.gov/~jamesh/bugreports/ctruncate/truncated_wilsons.png
and the script for producing the data for this plot from "scratch":
http://bl831.als.lbl.gov/~jamesh/bugreports/ctruncate/truncate_notes.com

Note that only 3 bins are even populated in the ctruncate result,whereas "truncate" and "xdsconv" seem to reproduce the true Wilson plotfaithfully down to well below the noise, which in this case is aGaussian deviate with RMS = 1.0 added to each F^2.

The "plateau" in the result from xdsconv is something I've been workingwith Kay to understand, but it seems to be a problem with theFrench-Wilson algorithm itself, and not any particular implementation ofit. Basically, French and Wilson did not want to assume that the Wilsonplot was straight and therefore don't use the "prior information" thatif the intensities dropped into the noise at 2.0 A then the averagevalue of "F" and 1.0 A is much much less than "sigma"! As a result, theFrench-Wilson values for "F" far above the traditional "resolutionlimit" can be overestimated by as much as a factor of a million.Perhaps this is why truncate and ctruncate complain bitterly about "databeyond useful resolution limit".

A shame really, because if the Wilson plot of the "truncated" data ismade to follow the linear trend we see in the low-angle data, then wewouldn't need to argue so much. After all, the only reason we apply aresolution cutoff is to try and suppress the "noise" coming from allthose background-only spots at high angle. But, on the other hand, wedon't want to cut the data too harshly or we will get series-terminationerrors. So, we must strike a compromise between these two sources oferror and call that the "resolution cutoff". But, if the conversion ofI to F actually used the "prior knowledge" of the fall-off of the Wilsonplot with resolution, then there would be no need for a "resolutioncutoff" at all. The current situation is portrayed in this graph:


http://bl831.als.lbl.gov/~jamesh/wilson/error_breakdown.png

which just showed the noise induced in an electron density map byapplying a resolution cutoff to otherwise "perfect" data, vs the errordue to adding noise and running truncate. If the noisy data weredown-weighted only a little bit, then the "total noise" curve wouldcontinue to drop, even at "infinite resolution".

I think it is also important to point out here that the "resolutioncutoff" of the data you provide to refmac or phenix.refine is notnecessarily the "resolution of the structure". This latter quantity,although emotionally charged, really does need to be more well-definedby this community and preferably in a way that is historically"stable". You can't just take data that goes to 5.0A and call it "4.5Adata" by changing your criterion. Yes, it is "better" to refine out to4.5A when the intensities drop into the noise at 5A, but that is nevergoing to be as good as using data that does not drop into the noiseuntil 4.5A.


-James Holton
MAD Scientist

On 6/27/2013 9:30 AM, Ian Tickle wrote:

On 22 June 2013 19:39, Douglas Theobald <[email protected]<mailto:[email protected]>> wrote:
    So I'm no detector expert by any means, but I have been assured by
    those who are that there are non-Poissonian sources of noise --- I
    believe mostly in the readout, when photon counts get amplified.
     Of course this will depend on the exact type of detector, maybe
    the newest have only Poisson noise.
Sorry for delay in responding, I've been thinking about it. It'sindeed possible that the older detectors had non-Poissonian noise asyou say, but AFAIK all detectors return _unsigned_ integers (unlesspossibly the number is to be interpreted as a flag to indicate someerror condition, but then obviously you wouldn't interpret it as acount). So whatever the detector AFAIK it's physically impossible forit to return a negative number that is to be interpreted as a photoncount (of course the integration program may interpret the count as a_signed_ integer but that's purely a technical software issue). Ithink we're all at least agreed that, whatever the true distributionof Ispot (and Iback) is, it's not in general Gaussian, except as anapproximation in the limit of large Ispot and Iback (with the provisothat under this approximation Ispot & Iback can never be negative).Certainly the assumption (again AFAIK) has always been that var(count)= count and I think I'm right in saying that only a Poissondistribution has that property?
    No, its just terminology.  For you, Iobs is defined as
    Ispot-Iback, and that's fine.  (As an aside, assuming the Poisson
    model, this Iobs will have a Skellam distribution, which can take
    negative values and asymptotically approaches a Gaussian.)  The
    photons contributed to Ispot from Itrue will still be Poisson.
     Let's call them something besides Iobs, how about Ireal?  Then,
    the Poisson model is

    Ispot = Ireal + Iback'

    where Ireal comes from a Poisson with mean Itrue, and Iback' comes
    from a Poisson with mean Iback_true.  The same likelihood function
    follows, as well as the same points.  You're correct that we can't
    directly estimate Iback', but I assume that Iback (the counts
    around the spot) come from the same Poisson with mean Iback_true
    (as usual).

    So I would say, sure, you have defined Iobs, and it has a Skellam
    distribution, but what, if anything, does that Iobs have to do
    with Itrue?  My point still holds, that your Iobs is not a valid
    estimate of Itrue when Ispot<Iback.  Iobs as an estimate of Itrue
    requires unphysical assumptions, namely that photon counts can be
    negative.  It is impossible to derive Ispot-Iback as an estimate
    for Itrue (when Ispot<Iback) *unless* you make that unphysical
    assumption (like the Gaussian model).
Please note that I have never claimed that Iobs = Ispot - Iback is tobe interpreted as an estimate of Itrue, indeed quite the opposite: Iagree completely that Iobs has little to do with Itrue when Iobs isnegative. In fact I don't believe anyone else is claiming that Iobsis to be interpreted as an estimate of Itrue either, so maybe this isthe source of the misunderstanding? Certainly for me Ispot - Iback ismerely the difference between the two measurements, nothing more.Maybe if we called it something other than Iobs (say Idiff), or evenavoided giving it a name altogether that would avoid any furtherconfusion? Perhaps this whole discussion has been merely aboutterminology?
    I'm also puzzled as to your claim that Iback' is not Poisson.  I
    don't think your QM argument is relevant, since we can imagine
    what we would have detected at the spot if we'd blocked the
    reflection, and that # of photon counts would be Poisson.  That is
    precisely the conventional logic behind estimating Iback' with
    Iback (from around the spot), it's supposedly a reasonable
    control.  It doesn't matter that in reality the photons are
    indistinguishable --- that's exactly what the probability model is
    for.
I'm not clear how you would "block the reflection"? How could you dothat without also blocking the background under it? A large part ofthe background comes from the TDS which is coming from the same placethat the Bragg diffraction is coming from, i.e. the crystal. I knowof no way of stopping the Bragg diffraction without also stopping theTDS (or vice versa). Indeed the theory shows that there is in realityno distinction between Bragg diffraction and TDS; they are justcomponents of the total scattering that we find convenient to imagineas separate in the dynamical model of scattering (seehttp://people.cryst.bbk.ac.uk/~tickle/iucr99/s61.html<http://people.cryst.bbk.ac.uk/%7Etickle/iucr99/s61.html> for therelevant equations).
Any given photon "experiences" the whole crystal on its way from thesource to the detector (in fact it experiences more than that: ittraverses all possible trajectories simultaneously, it's just that thevast majority cancel by destructive interference). The resulting wavefunction of the photon only collapses to a single point on hitting thedetector, with a frequency proportional to the square of the wavefunction at that point, so it's meaningless to talk about thetrajectory of an individual photon or whether it "belongs" to Ireal orIback'. You can't talk about the error distribution of theexperimental measurements of some quantity if it's a physicalimpossibility to design an experiment to measure it! It can of coursehave a probability distribution derived from prior knowledge of theproperties of crystals, but that's not a Poisson, it's a Wilson(exponential) distribution. Is that what you're thinking of?According to QM the only real quantities are the observables (orfunctions of observables); in this case only Ispot, Iback andIspot-Iback (and any other functions of Ispot & Iback that might berelevant) are physically meaningful quantities, all else is merespeculation, i.e. part of the model.
As I understand it the reason you are suggesting an alternate way ofestimating Itrue is that you have a fundamental objection to the F & Walgorithm? However I'm not clear precisely what you findobjectionable? Perhaps it would be useful to go through F & W indetail and identify where the problem (if any) lies?
We can say that the total likelihood of J (= Itrue) given Is (= Ispot)and Ib (= Iback) is equal to the prior probability density of J givenonly knowledge of the crystal (i.e. the estimated no of atoms fromwhich we can calculate E(J)), multiplied by the joint probabilitydensity of Is and Ib given J and its SD (assumed equal to the SD ofIs-Ib):
     P(J | Is,Ib) = P(J | E(J)) P(Is,Ib | J,sdJ)
The only function of Is and Ib that's relevant to the jointdistribution of Is and Ib given J and sdJ, P(Is,Ib | J), is thedifference Is-Ib (at least for large Is and Ib: I don't know whathappens if they are small). Note that it's perfectly proper to talkabout P(Is-Ib | J) in this context: it's the distribution of thedifference you expect to observe given any J. So the above can berewritten as:
     P(J | Is-Ib) = P(J | E(J)) P(Is-Ib | J,sdJ)
P(Is-Ib | J,sdJ) is just the Gaussian error distribution of Is-Ibmaking use of the Gaussian approximation of the Poisson. Finally,integrating out J to get the expectation of J (or of F=sqrt(J))completes the F-W procedure. As indicated earlier there are goodreasons to postpone this until after merging equivalents (which isexactly what we do now).
So what's wrong with that?

Cheers

-- Ian

Re: [ccp4bb] ctruncate bug?

Reply via email to