Re: [ccp4bb] Free Reflections as Percent and not a Number

Edward A. Berry Tue, 25 Nov 2014 10:30:12 -0800

provided the jiggling keeps the structure inside the convergence radius of 
refinement, then by definition the refinement will produce the same result 
irrespective of the starting point (i.e. jiggled or not).  If the jiggling 
takes the structure outside the radius of convergence then the original 
structure will not be retrievable without manual rebuilding: I'm assuming 
that's not the goal here.



I actually agree with this, but an R-free purist might argue that you have to get outside 
of radius of convergence to eliminate R-free bias. Otherwise, by definition, "you 
will just refine back to the same old biased structure!".
  (but you have shown that the conventional .2A rms is within radius of 
convergence)

In fact Dale's concern about low-res reflections could be put in terms of 
radius of convergence and false minima.
Moving a lot of atoms by .2 A will have a significant effect on the phase of a 
2A reflection, but almost no effect on a 20A reflection. Say you have refined 
against all the low resolution reflections, and got a structure that fits 
better than it should because it is fitting the noise in the free reflections. 
Now take away the free reflections and continue to refine. It will drop into 
the nearest local minimum, which since it is near the solution with all 
reflections, will still give artificially low R-free.  Jiggling by 0.2 A will 
have no effect because the local minima are are extremely broad and shallow, as 
far as the low-res reflections go.

But then you could say that since any local minima are so broad, all structures 
that are even slightly reasonable, (including the correct one) will be within 
radius of convergence of the same minimum as far as the low-res reflections are 
concerned. The nearest false minimum involves moving atoms by 5-10 A, so within 
reason the convergence point will be completely independent of the starting 
structure. Presumably this is why Phenix rigid body refinement starts out at 
ultra-low resolution: to increase the radius of convergence. From that 
perspective, rather than being the worrisome part, the low-resolution is the 
region where we can assume Ian's assumption is correct.

What about another experiment, which I think we've discussed before. Take a 
structure refined to convergence with a pristine free set. Now refine to 
convergence against all the data. The purist will say that the free set is 
hopelessly corrupted. And sure enough when we take that structure and calculate 
free-R with the original set, R-free is same as R-work within statistical 
significance.  But- I guess adding the extra 5% reflections will not change any 
atomic position by more than 0.2 A (maybe 0.02A), and so we are still well 
within radius of convergence of the original unbiased structure. Refining 
against the original working set will give back that unbiased structure, and 
Rfree will return to it original value.

This suggest, if the only purpose of Rfree is to get a number to deposit with 
the pdb (which it is not), you should first solve your structure using all the 
data, fitting the noise; then exclude a free set and back off on fitting the 
noise of it to get the R-free.  The only problem would be that during the 
refinement without guidance of R-free, you may have engaged in some practice 
that hurt the structure so much that it ends up out of RoC of the well-refined 
structure. Not because you were fitting the noise (anyway you are fitting the 
noise in your 95% working set) but because you would not have been warned that 
some procedure was not helping.

Very provocative discussion!
eab


On 11/25/2014 11:03 AM, Ian Tickle wrote:

Dear All

I'd like to raise the question again of whether any of this 'jiggling' (i.e. 
addition of random noise to the co-ordinates) is really necessary anyway, 
notwithstanding Dale's valid point that even if it were necessary, jiggling in 
its present incarnation is unlikely to work because it's unlikely to erase the 
influence of low res. reflexions.

My claim is that jiggling is completely unnecessary, because I maintain that 
refinement to convergence is alI that is required to remove the bias when an 
alternate test set is selected.  In fact I claim that it's the refinement, not 
the jiggling, that's wholly responsible for removing the bias.  I know we 
thrashed this out a while back and I recall that the discussion ended with a 
challenge to me to prove my claim that the refine-only Rfrees are indeed 
unbiased.  I couldn't see an easy way of doing this which didn't involve 
rebuilding and re-refining the same structure 20 times over, without 
introducing any observer bias.

The present discussion prompted me to think again about this and I believe I 
can prove part of my claim quite easily, that jiggling has no effect on the 
results.  Proving that the resulting Rfrees are unbiased is much harder, since 
as we've seen there's no proof that jiggling actually removes the bias as 
claimed by its proponents.  However given that said proponents of 
jiggling+refinement have been happy to accept for many years that their results 
are unbiased, then they must be equally happy now to accept that the 
refinement-only results are also unbiased, provided I can demonstrate that the 
difference between the results is insignificant.

The experimental proof rests on comparison between the Rfrees and RMSDs of the 
jiggled+refined and the refined-only structures for the 19 possible alternate 
test sets (assuming 5% test-set size).  If jiggling makes no difference as I 
claim then there should be no significant difference between the Rfrees and 
insignificant RMSDs for all pairs of alternate test sets.

However, first we must be careful to establish what is a suitable value for the 
noise magnitude to add to the co-ordinates.  If it's too small it won't remove 
the bias (again notwithstanding Dale's point that it's unlikely to have any 
effect anyway on the low res. data); too large and you push it beyond the 
convergence radius of the refinement and end up damaging the structure 
irretrievably (at least unless you're prepared to do significant rebuilding of 
the model).

For the record here's the crystal info for the test data I selected:

Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
Resol: 40-1.58 A.
Working set size: 11563   Test set size: 611 (5%)   Test set: 0
Refinement program:     BUSTER.
Noise addition program: PDBSET.

It's wise to choose a small protein since you need to run lots of refinements!  
However feel free to try the same thing with your own data.

First I took care that the starting model was refined to convergence using the 
original test set 0, and I performed 2 sequential runs of refinement with 
BUSTER (the deviations are relative to the input co-ordinates in each case):

Ncyc  Rwork   Rfree   RMSD MaxDev
   82     0.181  0.230     0.005   0.072
   51     0.181  0.231     0.002   0.015

The advantage of using BUSTER is that it has its own convergence test; with 
REFMAC you have to guess.

Then I tried a range of input noise values (0.20, 0.25. 0.30, 0.35, 0.40, 0.50 
A) on the refined starting model.  Note that these are RMSDs, not maximum 
shifts as claimed by the PDBSET documentation.  In each case I did 4 sequential 
runs of BUSTER on the jiggled co-ordinates and by looking at the RMSDs and max. 
shifts I decided that 0.25 A RMSD was all the structure could stand without 
risking permanent damage (note that the default noise value in PDBSET is 0.2):

Initial RMSD: 0.248  MaxDev: 0.407

Ncyc  Rwork   Rfree   RMSD  MaxDev
  358    0.183   0.230    0.052    0.454
  126    0.181   0.232    0.041    0.383
    65    0.181   0.232    0.040    0.368
    50    0.181   0.232    0.040    0.360

The only purpose of the above refinements is to establish the most suitable 
noise value; the resulting refined PDB files were not used.

So then I took the co-ordinates with 0.25 A noise added and for each test set 
1-19 did 2 sequential runs of BUSTER.

Finally I took the original refined starting model (i.e. without noise 
addition) and again refined to convergence using all 19 alternate test sets.

The results are attached.  The correlation coefficient between the 2 sets of 
Rfrees is 0.992 and the mean RMSD between the sets is 0.04 A, so the difference 
between the 2 sets is indeed insignificant.

I don't find this result surprising at all: provided the jiggling keeps the 
structure inside the convergence radius of refinement, then by definition the 
refinement will produce the same result irrespective of the starting point 
(i.e. jiggled or not).  If the jiggling takes the structure outside the radius 
of convergence then the original structure will not be retrievable without 
manual rebuilding: I'm assuming that's not the goal here.

I suspect that the idea of jiggling may have come about because refinements 
have not always been carried through to convergence: clearly if you don't do a 
proper job of refinement then you must expect some of the original bias to 
remain.  Also to head off the suggestion that simulated annealing refinement 
would fix this I would suggest that any kind of SA refinement is only of value 
for initial MR models when there may be significant systematic error in the 
model; it's not generally advisable to perform it on final refined models 
(jiggled or not) when there is no such systematic error present.

Cheers

-- Ian

On 21 November 2014 18:56, Dale Tronrud <[email protected] 
<mailto:[email protected]>> wrote:

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
     > <snip...>
    >
    > As to the convenience of carrying over a test set to another
    > dataset, Eleanor made a suggestion to circumvent this necessity
    > some time ago: pass your coordinates through pdbset and add some
    > noise before refinement:
    >
    > pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof
    >

        I've heard this "debiasing" procedure proposed before, but I've
    never seen a proper test showing that it works.  I'm concerned that
    this will not erase the influence of low resolution reflections that
    were in the old working set but are now in the new test set.  While
    adding 0.4 A gaussian noise to a model would cause large changes to
    the 2 A structure factors I doubt it would do much to those at 10 A.

        It seems to me that one would have to have random, but correlated,
    shifts in atomic parameters to affect the low resolution data - waves
    of displacements, sometimes to the left and other times to the right.
      You would need, of course, a superposition of such waves that span
    all the scales of resolution in the data set.

        Has anyone looked at the pdbset jiggling results and shown that the
    low resolution data are scrambled?

    Dale Tronrud

    > Xavier
    >
    > On 20/11/14 11:43 PM, Keller, Jacob wrote:
    >> Dear Crystallographers,
    >>
    >> I thought that for reliable values for Rfree, one needs only to
    >> satisfy counting statistics, and therefore using at most a couple
    >> thousand reflections should always be sufficient. Almost always,
    >> however, some seemingly-arbitrary percentage of reflections is
    >> used, say 5%. Is there any rationale for using a percentage
    >> rather than some absolute number like 1000?
    >>
    >> All the best,
    >>
    >> Jacob
    >>
    >> ******************************************* Jacob Pearson Keller,
    >> PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
    >> Ashburn, VA 20147 email:[email protected] 
<mailto:[email protected]>
    >> ******************************************* .
    >>
    >
    > --
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.22 (MingW32)

    iEYEARECAAYFAlRviu4ACgkQU5C0gGfAG12TMwCfTT0Q4yfCCOxJlRXtsCXmmp1n
    9lEAn2Ir57+Y16fh02VcsvDxwu6KYRGK
    =68gK
    -----END PGP SIGNATURE-----

Re: [ccp4bb] Free Reflections as Percent and not a Number

Reply via email to