Dear All

I'd like to raise the question again of whether any of this 'jiggling'
(i.e. addition of random noise to the co-ordinates) is really necessary
anyway, notwithstanding Dale's valid point that even if it were necessary,
jiggling in its present incarnation is unlikely to work because it's
unlikely to erase the influence of low res. reflexions.

My claim is that jiggling is completely unnecessary, because I maintain
that refinement to convergence is alI that is required to remove the bias
when an alternate test set is selected.  In fact I claim that it's the
refinement, not the jiggling, that's wholly responsible for removing the
bias.  I know we thrashed this out a while back and I recall that the
discussion ended with a challenge to me to prove my claim that the
refine-only Rfrees are indeed unbiased.  I couldn't see an easy way of
doing this which didn't involve rebuilding and re-refining the same
structure 20 times over, without introducing any observer bias.

The present discussion prompted me to think again about this and I believe
I can prove part of my claim quite easily, that jiggling has no effect on
the results.  Proving that the resulting Rfrees are unbiased is much
harder, since as we've seen there's no proof that jiggling actually removes
the bias as claimed by its proponents.  However given that said proponents
of jiggling+refinement have been happy to accept for many years that their
results are unbiased, then they must be equally happy now to accept that
the refinement-only results are also unbiased, provided I can demonstrate
that the difference between the results is insignificant.

The experimental proof rests on comparison between the Rfrees and RMSDs of
the jiggled+refined and the refined-only structures for the 19 possible
alternate test sets (assuming 5% test-set size).  If jiggling makes no
difference as I claim then there should be no significant difference
between the Rfrees and insignificant RMSDs for all pairs of alternate test
sets.

However, first we must be careful to establish what is a suitable value for
the noise magnitude to add to the co-ordinates.  If it's too small it won't
remove the bias (again notwithstanding Dale's point that it's unlikely to
have any effect anyway on the low res. data); too large and you push it
beyond the convergence radius of the refinement and end up damaging the
structure irretrievably (at least unless you're prepared to do significant
rebuilding of the model).

For the record here's the crystal info for the test data I selected:

Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
Resol: 40-1.58 A.
Working set size: 11563   Test set size: 611 (5%)   Test set: 0
Refinement program:     BUSTER.
Noise addition program: PDBSET.

It's wise to choose a small protein since you need to run lots of
refinements!  However feel free to try the same thing with your own data.

First I took care that the starting model was refined to convergence using
the original test set 0, and I performed 2 sequential runs of refinement
with BUSTER (the deviations are relative to the input co-ordinates in each
case):

Ncyc  Rwork   Rfree   RMSD MaxDev
  82     0.181  0.230     0.005   0.072
  51     0.181  0.231     0.002   0.015

The advantage of using BUSTER is that it has its own convergence test; with
REFMAC you have to guess.

Then I tried a range of input noise values (0.20, 0.25. 0.30, 0.35, 0.40,
0.50 A) on the refined starting model.  Note that these are RMSDs, not
maximum shifts as claimed by the PDBSET documentation.  In each case I did
4 sequential runs of BUSTER on the jiggled co-ordinates and by looking at
the RMSDs and max. shifts I decided that 0.25 A RMSD was all the structure
could stand without risking permanent damage (note that the default noise
value in PDBSET is 0.2):

Initial RMSD: 0.248  MaxDev: 0.407

Ncyc  Rwork   Rfree   RMSD  MaxDev
 358    0.183   0.230    0.052    0.454
 126    0.181   0.232    0.041    0.383
   65    0.181   0.232    0.040    0.368
   50    0.181   0.232    0.040    0.360

The only purpose of the above refinements is to establish the most suitable
noise value; the resulting refined PDB files were not used.

So then I took the co-ordinates with 0.25 A noise added and for each test
set 1-19 did 2 sequential runs of BUSTER.

Finally I took the original refined starting model (i.e. without noise
addition) and again refined to convergence using all 19 alternate test sets.

The results are attached.  The correlation coefficient between the 2 sets
of Rfrees is 0.992 and the mean RMSD between the sets is 0.04 A, so the
difference between the 2 sets is indeed insignificant.

I don't find this result surprising at all: provided the jiggling keeps the
structure inside the convergence radius of refinement, then by definition
the refinement will produce the same result irrespective of the starting
point (i.e. jiggled or not).  If the jiggling takes the structure outside
the radius of convergence then the original structure will not be
retrievable without manual rebuilding: I'm assuming that's not the goal
here.

I suspect that the idea of jiggling may have come about because refinements
have not always been carried through to convergence: clearly if you don't
do a proper job of refinement then you must expect some of the original
bias to remain.  Also to head off the suggestion that simulated annealing
refinement would fix this I would suggest that any kind of SA refinement is
only of value for initial MR models when there may be significant
systematic error in the model; it's not generally advisable to perform it
on final refined models (jiggled or not) when there is no such systematic
error present.

Cheers

-- Ian


On 21 November 2014 18:56, Dale Tronrud <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
> > <snip...>
> >
> > As to the convenience of carrying over a test set to another
> > dataset, Eleanor made a suggestion to circumvent this necessity
> > some time ago: pass your coordinates through pdbset and add some
> > noise before refinement:
> >
> > pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof
> >
>
>    I've heard this "debiasing" procedure proposed before, but I've
> never seen a proper test showing that it works.  I'm concerned that
> this will not erase the influence of low resolution reflections that
> were in the old working set but are now in the new test set.  While
> adding 0.4 A gaussian noise to a model would cause large changes to
> the 2 A structure factors I doubt it would do much to those at 10 A.
>
>    It seems to me that one would have to have random, but correlated,
> shifts in atomic parameters to affect the low resolution data - waves
> of displacements, sometimes to the left and other times to the right.
>  You would need, of course, a superposition of such waves that span
> all the scales of resolution in the data set.
>
>    Has anyone looked at the pdbset jiggling results and shown that the
> low resolution data are scrambled?
>
> Dale Tronrud
>
> > Xavier
> >
> > On 20/11/14 11:43 PM, Keller, Jacob wrote:
> >> Dear Crystallographers,
> >>
> >> I thought that for reliable values for Rfree, one needs only to
> >> satisfy counting statistics, and therefore using at most a couple
> >> thousand reflections should always be sufficient. Almost always,
> >> however, some seemingly-arbitrary percentage of reflections is
> >> used, say 5%. Is there any rationale for using a percentage
> >> rather than some absolute number like 1000?
> >>
> >> All the best,
> >>
> >> Jacob
> >>
> >> ******************************************* Jacob Pearson Keller,
> >> PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
> >> Ashburn, VA 20147 email: [email protected]
> >> ******************************************* .
> >>
> >
> > --
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (MingW32)
>
> iEYEARECAAYFAlRviu4ACgkQU5C0gGfAG12TMwCfTT0Q4yfCCOxJlRXtsCXmmp1n
> 9lEAn2Ir57+Y16fh02VcsvDxwu6KYRGK
> =68gK
> -----END PGP SIGNATURE-----
>

Reply via email to