Re: [ccp4bb] ctruncate bug?

Douglas Theobald Wed, 24 Jul 2013 11:39:31 -0700

Hi Randy,

So I've been playing around with equations myself, and I have some alternative 
results.

As I understand your Mathematica stuff, you are using the data model:

ip = ij + ib'

ib

where ip is the measured peak (before any background correction), and ij is a 
random sample from the true intensity j.  Here ib is the measured background, 
whereas ib' is the background absorbed into ip.  Both ib and ib' are a random 
sample from background jb.  Again, only ip and ib are observed; ij and ib' are 
"hidden" variables.  

Now let me recap your treatment of that model (hopefully I get this right).

You assume Poisson distributions for ip, ij, ib, and ib', and find the joint 
probability of observed ip and ib given j and jb, p(ip,ib|j,jb).  You can 
consider ip and ib as statistically independent, since ip depends on ib', not 
ib.  You then marginalize over jb (the true background intensity) using a flat 
uninformative prior, giving p(ip,ib|j).  You find that p(ip,ib|j) is similar to 
F&W's p(ip-ib|j, sdj), where sdj=sqrt(ip+ib).  

Some sort of scaling is necessary, since in practice ib and ip are counted from 
different numbers of pixels.  You find that, for roughly equal scaling, the 
Poisson version is similar to F&W's Gaussian approximation for even moderate 
counts.    

However, in practice, we measure the background from a much larger area than 
the spot.  For example, in the mosflm window I have open now, the background 
area is > 20 times the spot area, for high res, low SNR spots.  Similarly, in 
xds the spot-to-background ratio, in terms of pixel #, is > 10 on average and > 
5 for the great majority of spots.  Therefore, we typically know the value of 
jb to a much better precision than what we can get from ip (which is 
essentially an estimate of j+jb).  

If the relative sd of the background is about 2 or 3 times less than that of 
the spot ip, we can approximate the background estimate of jb as a constant 
(ie, ignore the uncertainty in its value).  This will be valid if the total 
area used for the background measurement is roughly >5 times the area of the 
spot (even less for "negative" peaks).  So what we can do is estimate jb using 
ib, and then find the conditional distribution of j given ip and jb.  Using 
your notation, this distribution is given by:

p(j|ip,jb) = exp(-(jb+j)) (jb+j)^ip / Gamma(ip+1,jb)

where Gamma(.,.) is the upper incomplete gamma function.  

The moments of this distribution have nice analytical forms (well, at least as 
nice as F&W's).  Here's a table comparing the F&W estimates to this Poisson 
treatment, using Randy's ip and jb values, plus some others:

ip   jb    Exp[j]_fw  SD[j]_fw  h     Exp[j]_dt  SD[j]_dt  %diff
---- ----  ---------  --------  ---   ---------  --------  -----
  55   45 11.3        6.3       1.3   11.9       6.8         5.3
  45   55  3.0        2.6      -1.5    3.7       3.3         5.4
  35   65  1.1        1.1      -5.1    2.0       2.0        86
   6   10  1.0        0.91     -1.6    1.8       1.7        80
   1    3  0.37       0.34     -2.0    1.3       1.2       240
   4   12  0.45       0.43     -4.0    1.4       1.3       210 

 100  100  8.0        6.0       0      8.6       6.6         7.4
  85  100  3.9        3.4      -1.6    4.7       4.2        20
  75  100  2.5        2.4      -2.9    3.4       3.2        35
 500  500 17.8       13.5       0     18.4      14.0         3.3
 440  500  6.2        5.8      -2.9    7.0       6.6        14
1000 1000 25.2       19.1       0     25.8      21           2.3
 920 1000  9.4        8.8      -2.6   10.3       9.5         9.1
 940 1000 11.6       10.5      -2.0   12.4      11           7

In this table I've used sdj=sqrt(ip) for F&W, since I'm ignoring the 
uncertainty in jb --- Randy used sqrt(ip+ib).  

h = (ip-jb)/sdj  

%diff = (Exp[j]_dt - Exp[j]_fw)/Exp[j]_fw  

Here jb is the # background counts normalized to have the same pixel area as 
ip.  

Whether these would be considered important differences, I'm not sure.  The 
differences are greatest when ip<jb (that is, for "negative intensities").

As an aside:

It's easy to expand this to include the acentric Wilson prior:

p(j|ip,jb,w) = exp(-(jb+j)(w+1)) (jb+j)^ip (w+1)^(ip+1) / Gamma(ip+1,jb(w+1))

where w = 1/sigma_w, sigma_w = the Wilson sigma.  Again, the moments have 
analytical forms.  

On Jul 1, 2013, at 5:47 AM, Randy Read <[email protected]> wrote:

> Hi,
> 
> I've been following this discussion, and I was particularly interested by the 
> suggestion that some information might be lost by turning the separate peak 
> and background measurements into a single difference.  I accept the point 
> that there might be value in, e.g., TDS models that pay explicit attention to 
> non-Bragg intensities, but this whole discussion started from the point of 
> what estimates to use for diffracted Bragg intensities in processes such as 
> molecular replacement, refinement, and map calculations.
> 
> I thought I'd run this past the two of you, in case I've missed something.  
> What I decided to look at is the probability distribution for the true 
> diffraction intensity, given the peak and background measurements.  I'm 
> assuming that the peak and background measurements have a Poisson 
> distribution from counting statistics, which seems fine because I'm comparing 
> the Poisson model with a Gaussian approximation, and if there were other 
> sources of non-Poisson error the true distribution should become more 
> Gaussian anyway.
> 
> Where I thought it might make a difference to consider the peak and 
> background separately is that, if the net intensity comes out as negative in 
> the simple difference, this implies that the random errors in the peak region 
> are more negative than the random errors in the background region.  By 
> considering a joint distribution of peak and background counts, connected by 
> a common underlying background intensity probability, you might hope that 
> this would make a difference.
> 
> However, what I get (which I hope you can follow from the PDF derived from a 
> Mathematica notebook) is that, even for very small numbers of counts, the F&W 
> model of integrating over the positive intensity values consistent with a 
> Gaussian peak-background difference gives surprisingly good agreement with 
> the difference of Poisson variables model.  
> 
> In the Poisson derivation, I'm using Bayes' rule to turn the joint 
> probability of the pair of peak/background intensity measurements into the 
> joint probability of the true underlying intensities.  Like F&W, this 
> requires a prior probability distribution for the true underlying 
> intensities.  I'm using an uninformative prior (allowing all positive values 
> with equal probability) and comparing that to F&W also with uninformative 
> priors.  Wilson priors could be used for both instead, but that wouldn't 
> really change the result of the comparison.
> 
> I've allowed for the possibility that there are different scale factors for 
> the peak and background regions (so that there are different sizes of random 
> errors for the background component of both measurements), and that only 
> seems to make a minor difference in the results.
> 
> So my conclusion is that, if what you're after is the probability 
> distribution of the true intensity, there's no extra information gained in 
> considering peak and background separately, and that the simple Gaussian 
> approximation of the difference of two Poisson variables is surprisingly 
> good.  In other words, for these purposes, the difference between peak and 
> background (and the standard deviation of that difference) is a sufficient 
> statistic.
> 
> Of course, I'd be very grateful if either of you found anything wrong with 
> the reasoning!  Also, I've lost some of the messages in this thread, so I 
> don't know if these probability distributions are reproducing anything that 
> has already been discussed.
> 
> Best wishes,
> 
> Randy
> <pIntensityPeakBkg.pdf>
> ------
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research      Tel: + 44 1223 336500
> Wellcome Trust/MRC Building                   Fax: + 44 1223 336827
> Hills Road                                    E-mail: [email protected]
> Cambridge CB2 0XY, U.K.                       www-structmed.cimr.cam.ac.uk
>

Re: [ccp4bb] ctruncate bug?

Reply via email to