Thanks Ian, I think I understood you correctly and have worked out the change
in R-free for each refinement. The results are here:
https://www.ucl.ac.uk/~rmhajc0/rfreechange.pdf
I don't think it shows any sign of the bias we discussed.
On Friday, 7 June 2019, 14:37:03 BST, Ian Tickle <[email protected]> wrote:
Hi Jon
It's not valid to use model-selection metrics such as R factors to compare
different datasets: they are just not designed for that purpose.
Model-selection metrics are designed to select the best model from a set of
possible models all obtained using the _same_ data (or sample of data for CV),
but with different model parameters and restraints. The reason is very simple:
R factors will vary purely according to the data sample used (the 'sampling
error') and this variation can easily wipe out any variation due to NCS bias.
This is particularly true of CV metrics like Rfree because they use a
relatively small sample, so if that sample is 5% then the size of test set will
be 1/19th of the working set and the sampling error will be at least sqrt(19)
times as big. Also the 20 working sets generated by e.g. FREERFLAG have ~ 95%
reflections in common which reduces the variation, whereas the 20 test sets
have none in common. You can see that the Rfree values vary much more than the
Rwork values for this reason.
Having said that there is one possible way around it. The sampling error
obviously doesn't change _during_ a single refinement run since it's the same
data, i.e. we could assume that the effect of the sampling error is the same at
the beginning and at the end of the run. Clearly if the Rfree for one data
sample at the beginning of a run is higher than the Rfree at the beginning of a
run for a different data sample then it's hardly fair to compare the Rfree
values at the end of the two runs. It's like giving some runners a 1 hour
start in the marathon and deciding who is the winner according to who crosses
the finishing line first!
So assuming that every run starts with the same model but different sample of
data, you may be able reduce the effect of sampling errors by comparing the
_changes_ in the R factors from the beginning to the end of a refinement run.
It's like every marathon runner using their own timer so each measures their
finishing time _relative_ to their starting time, rather than using the same
timer for everyone.
Cheers
-- Ian
On Fri, 24 May 2019 at 23:27, Jonathan Cooper
<[email protected]> wrote:
Having been fond of the idea discussed above i.e. that when NCS is present,
one should have the R-free set chosen in shells, I did some simple tests. Many
others must have done the same, but here's how it went:
1) Choose a few familiar structures, both with and without NCS and get the data.
2) Since there was some difficulty in remembering if the original R-free sets
were in shells or not, I ditched any existing test set (shock, but see 3 below)
and generated new ones, both at random and in shells (using SFTOOLS and I
repeated some with an old copy of SHELXPRO). Some of the reflection files
lacked original R-free sets since they weredeposited before the R-free was
invented.
3) Reduce the bias of each model to the reflections that are now in the new
test setsand tease out over-fitting by rattling the structures a bit, i.e. add
a random +/-0.1 Angstroms to x, y and z of each atom (0.17 Angstroms net shift)
and reset all the B-factors to 30 A^2.
4) Refine the rattled structures with the new R-free sets, i.e. random and in
shells (no NCS restraints).
5) If anyone is really interested, the results are here:
https://www.ucl.ac.uk/~rmhajc0/rfreetests.pdf
but to summarise, assuming the programs have picked the test sets in shells or
otherwise correctly (!), there seems to be no significant difference between
the R-free in shells and the normal one, whether NCS is present or not. If
anything, the R-free in shells tends to be a tiny bit lower than the normal
R-free when NCS is present, although this is probably by chance due to the
small number of tests done!
I am sure this is a well known fact, but haven't had the chance to test it till
now! On Sunday, 19 May 2019, 13:22:00 BST, Ian Tickle <[email protected]>
wrote:
Hi Ed
Yes, Rfree: my favourite topic, I'll take this one on! First off, we all need
to be ultra-careful and precise about the terminology here, for fear of
creating even more confusion. For example what on earth is meant by
"reflections ... are uncorrelated"? A reflection can be regarded as a object
that possesses a set of attributes (indices, d spacing, setting angles,
position on detector, LP correction, intensity, amplitude, phase, errors in
those, etc. etc.). An object as such is not associated with any kind of value
(it is rather an instance of a class of objects possessing the same set of
attributes but with different values for those attributes), so it's totally
meaningless to talk about the correlation, or lack thereof, of two sets of
objects (what's the correlation of a bag of apples and a bag of oranges?). You
can only talk about the correlation of the values of the objects' attributes
(e.g. the apples' and oranges' size or weight). Perhaps you'll say that it was
clear from the context that you meant the correlation of the reflection's
measured intensities (or amplitudes). If that is what you meant then you would
be wrong! The fact that it's not about NCS-related intensities or amplitudes
does rather throw a spanner in the works of those who claim that's it's the
correlation of these quantities that obliges one to choose the test set in a
certain way.
Before I say why, I would also point out that R factors are not the quantities
minimised in refinement: for one thing the conventional Rwork and Rfree are
unweighted so all reflections whether poorly or well-measured contribute
equally, which makes no sense. In ML refinement it's the negative
log-likelihood gain (-LLG) that's minimised so that is the quantity you should
be using. This means that one cannot expect Rwork to be a minimum at
convergence since it's not directly related to LLGwork. In addition one has no
idea what is the confidence interval of an R factor so it's impossible to say
whether a given decrease in R is significant or not. So R factors are entirely
unsuited for any kind of quantitative analysis of model errors, and I despair
when I read papers that do just that. The R factor was devised in the 50's
before calculators or computers became readily available and crystallographic
computations were performed with pencil & paper! So the form of the R factor,
i.e. using an unweighted absolute value instead of a weighted square as would
have been appropriate for least squares refinement, was specifically designed
as a rough-and-ready guide of refinement progress, not a quantitative measure.
To see why it's not about intensities or amplitudes, it's important to
understand the purpose and operation of cross-validation (a.k.a. 'jack-knife
test') with a test set set aside for this purpose and using a statistic such as
LLGfree (or Rfree if you must), in order to quantify the agreement of the model
with the test set. In any scientific experiment the measuring apparatus is
never perfect so never reports the true values of the quantities being
measured: measurement errors are an inevitable fact of life. Cross-validation
flags up the impact of these errors on the model that is used to explain the
measurements by some process of best-fitting to them. Note that by 'model' I
mean the mathematical model, i.e. in this case the structure-factor equation
that relates the atomic model to the measurements. The adjustments in the
model's variable parameters (x, y, z, B etc.) during refinement may give a
closer fit between the true and calculated amplitudes in which case both
-LLGwork and -LLGfree will both decrease (as indicated above Rwork and Rfree
may go up or down unpredictably).
Unfortunately we have only the measured amplitudes, not the true ones, so in
this process of fitting one may go too far and fit to the measurement errors
('overfitting'), which will obviously introduce errors in the model. If one
only considers the refinement target function (LLG) or Rwork, it will always
appear that the model is improving even when it isn't (i.e. agreeing better
with the measured values but not necessarily with the true values due to the
errors in the measured values). This generally happens because in the attempt
to extract more detail in the model from the data one has set up a model with
more variables (or fewer/too loose restraints) than the data can support.
Since the changes in the model on overfitting will not be related to changes
required to obtain the true model values but to completely arbitrary random
numbers unrelated to the truth, and provided the measurement errors in the test
set are uncorrelated with those in the working set, the test-set statistic will
most likely go on its own sweet way (i.e. up) indicating overfitting. If for
any reason the measurement errors of working and test-set reflections are
correlated, then the test-set statistic will be biased towards the working-set
value and so will not be a reliable diagnostic of overfitting. Note that the
overfitting fate is decided at the point where we choose the starting set of
parameters and restraints, though it doesn't become apparent until after the
subsequent refinement run has completed. Then one should redesign the model
with fewer variables and/or more/tighter restraints, and repeat the last run,
rather than proceed further with the faulty model. If overfitting is diagnosed
by the cross-validation test, try something else!
So there you have it: what matters is that the _errors_ in the NCS-related
amplitudes are uncorrelated, or at least no more correlated than the errors in
the non-NCS-related amplitudes, NOT the amplitudes themselves. This is like
when talking about the standard deviation of a quantity, do you mean the
quantity itself (e.g. the electron density in the map), or the _error_ in that
quantity (the practice of calling the latter the 'standard deviation in the
error' or 'standard error' to avoid this confusion is to be commended).
Finally let's examine this: are the _errors_ in the NCS-related amplitudes
expected to be more correlated than errors of non-NCS-related amplitudes,
giving test-set statistic bias if the NCS-related working-set reflection is
selected to be in the test-set. as opposed to having both in the same set?
Clearly counting errors are totally random and uncorrelated with anything so
they will contribute zero correlation to both NCS and non-NCS-related errors in
amplitudes. What other sources of measurement error are there? - most likely
errors in image scale factors, errors due to variability in the illuminated
volume of the crystal and errors due to radiation damage. Is there any reason
to believe that any of these effects could introduce more correlation of errors
of NCS-related intensities compared with non-NCS-related? I would suggest that
this could happen only by a complete fluke!
Cheers
-- Ian
On Sun, 19 May 2019 at 04:34, Edward A. Berry <[email protected]> wrote:
Revisiting (and testing) an old question:
On 08/12/2003 02:38 PM, [email protected] wrote:
> *** For details on how to be removed from this list visit the ***
> *** CCP4 home page http://www.ccp4.ac.uk ***
> On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
>>
>> (1) you only need to take special care for choosing a test set if you _apply_
>> the NCS in your refinement, either as restraints or as constraints. If you
>> refine your NCS protomers without any NCS restraints/constraints, both your
>> protomers and your reflections will be independent, and thus no special care
>> for choosing a test set has to be taken
>
> If your space group is P6 with only one molecule in the asymmetric unit but
> you instead choose the subgroup P3 in which to refine it, and you now have
> two molecules per asymmetric unit related by "local" symmetry to one another,
> but you don't apply it, does that mean that reflections that are the same (by
> symmetry) in P6 are uncorrelated in P3 unless you apply the "NCS"?
===================================================
The experiment described below seems to show that Dirk's initial
statement was correct: even in the case where the "ncs" is actually
crystallographic, and the free set is chosen randomly, R-free is not
affected by how you pick the free set. A structure is refined with
artificially low symmetry, so that a 2-fold crystallographic operator
becomes "NCS". Free reflections are picked either randomly (in which
case the great majority of free reflections are related by the NCS to
working reflections), or taking the lattice symmetry into account so
that symm-related pairs are either both free or both working. The final
R-factors are not significantly different, even with repeating each mode
10 times with independently selected free sets. They are also not
significantly different from the values obtained refining in the correct
space group, where there is no ncs.
Maybe this is not really surprising. Since symmetry-related reflections
have the same resolution, picking free reflections this way is one way
of picking them in (very) thin shells, and this has been reported not to
avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And see:
http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
But this is more significant: in cases of lattice symmetry like this,
the ncs takes working reflections directly onto free reflections. In the
case of true ncs the operator takes the reflection to a point between
neighboring reflections, which are closely coupled to that point by the
Rossmann G function. Some of these neighbors are outside the thin shell
(if the original reflection was inside; or vice versa), and thus defeat
the thin-shells strategy. In our case the symm-related free reflection
is directly coupled to the working reflection by the ncs operator, and
its neighbors are no closer than the neighbors of the original
reflection, so if there is bias due to NCS it should be principally
through the sym-related reflection and not through its neighbors. And so
most of the bias should be eliminated by picking the free set in thin
shells or by lattice symmetry.
Also, since the "ncs" is really crystallographic, we have the control of
refining in the correct space group where there is no ncs. The R-factors
were not significantly different when the structure was refined in the
correct space group. (Although it could be argued that that leads to a
better structure, and the only reason the R-factors were the same is
that bias in the lower symmetry refinement resulted in lowering Rfree
to the same level.)
Just one example, but it is the first I tried- no cherry-picking. I
would be interested to know if anyone has an example where taking
lattice symmetry into account did make a difference.
For me the lack of effect is most simply explained by saying that, while
of course ncs-related reflections are correlated in their Fo's and Fc's,
and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
_changes_ in |Fo-Fc| produced by a step of refinement will be correlated
(I can expound on this). Therefore whatever refinement is doing to
improve the fit to working reflections is equally likely to improve or
worsen the fit to sym-related free reflections. In that case it is hard
to see how refinement against working reflections could bias their
symm-related free reflections. (Then how does R-free work? Why does
R-free come down at all when you refine? Because of coupling to
neighboring working reflections by the G-function?)
Summary of results (details below):
0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087
1. The deposited 2chr (I422) was refined in that space group with the
original free set. No Sigma cutoff, 10 macrocycles.
R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087
2. The deposited structure was refined in I422 10 times, 50 macrocycles
each, with randomly picked 10% free reflections
R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl: 9087
3. The structure was expanded to an I4 dimer related by the unused I422
crystallographic operator, matching the dimer of 1chr. This dimer was
refined against the original (I4) data of 1chr, picking free reflections
in symmetry related pairs. This was repeated 10 times with different
random seed for picking reflections.
R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011
4. same as 3 but picking free reflections randomly without regard for
lattice symmetry.
On average 15 free reflections were in pairs, 212 were invariant under
the operator (no sym-mate) and 1374 (86%) were paired with working
reflections.
R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011
(**-Average Rfree almost identical by coincidence- the individual
results were all different)
Detailed results from the individual refinement runs are available in
spreadsheet in dropbox:
https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0
Scripts used in running the tests are also there in NCSbias.tgz:
https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0
========================================
Methods:
I would like an experiment where relatively complete data is available
in the lower symmetry. To get something that is available to everyone, I
choose from the PDB. A good example is 2CHR, in space group I422, which
was originally solved and the data deposited in I4 with two molecules in
the asymmetric unit(structure 1CHR).
2CHR statistics from the PDB:
R R-free complete (Refined 8.0 to 3.0 A
0.189 0.264 81.4 reported in PDB, with 2-Sig cutoff)
Nfree=442 (4.86%)
Further refinement in phenix with same free set, no sigma cutoff:
10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
(2chr_orig_001.pdb,
The number of free reflections is small, so the uncertainty
in Rfree is large (a good case for Rcomplete)
Instead for better statistics, use new 10% free set and repeat 10 times;
50 macrocycles, with different random seeds:
R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
Nfree: 908.9±0.32 Nrefl: 9087
For artificially low symmetry, expand the I422 structure (making what I
call 3chr for convenience although I'm sure that ID has been taken):
pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
exclude header
spacegroup I4
cell 111.890 111.890 148.490 90.00 90.00 90.00
symgen X,Y,Z
symgen X,1-Y,1-Z
CHAIN SYMMETRY 2 A B
eof
Get the structure factors from 1CHR: 1chr-sf.cif
Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
This file has no free set (deposited 1993) so tell phenix to generate
one. I don't want phenix to protect me from my own stupidity, so I use:
generate = True
use_lattice_symmetry = False
use_dataman_shells = False
(the .eff file with all non-default parameters is available as
3chr_rand_001.eff in the .tgz mentioned above)
For more significance, use the script multirefine.csh to repeat the refinement
10 times with different random seed.After each run, grep significant results
into a log file.
To check this gives free reflections related to working reflections, I
used mtz2various and a fortran prog (sortfree.f in .tgz) to separate the
data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
(columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
LABIN FP=F-obs DUM1=R-free-flags
OUTPUT USER '(3I4,2F10.5)'
eof
sortfree <<eof >sort3.hkl
sort3.hkl looks like:
______h>k______ ______h<k______
h k l F free F* free*
1 2 3 208.97 0.00 174.95 0.00
1 2 5 226.85 0.00 191.65 0.00
1 2 7 144.85 0.00 164.86 0.00
1 2 9 251.26 0.00 261.71 0.00
1 2 11 333.84 0.00 335.18 0.00
1 2 13 800.37 0.00 791.77 0.00
1 2 15 412.92 0.00 409.90 0.00
1 2 17 306.99 0.00 317.53 0.00
1 2 19 225.54 0.00 220.91 0.00
1 2 21 101.20 1.00* 104.84 0.00
1 2 23 156.27 0.00 156.49 0.00
1 2 25 202.97 0.00 202.23 0.00
1 2 27 216.10 0.00 219.28 0.00
1 2 29 106.76 0.00 100.93 0.00
1 2 31 157.32 0.00 154.37 1.00*
1 2 33 71.84 0.00 20.78 0.00
1 2 35 179.05 0.00 165.67 0.00
1 2 37 254.04 0.00 239.96 1.00*
1 2 39 69.56 0.00 30.61 0.00
1 2 41 56.20 0.00 51.02 0.00
, and awked for 1 in the free columns. Out of 6922 pairs of reflections,
in one case:
674 in the first asu (h>k) are in the free set,
703 in the second asu (h<k) are in the free set
only 11 pairs have the reflections in both asu free.
out of 16011 refl in I4,
6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with absent mate.
out of 1601 free reflections:
On average 15 free reflections were in pairs, 212 were invariant under
the operator (no sym-mate) and 1374 (86%) were paired with working
reflections.
Then do 10 more runs of 50 macrocycles with:
use_lattice_symmetry = False
collecting the same statistics
(also scripted in multirefine.csh)
Finally, use ref2chr.eff to refine (as previously mentined) a monomer in I422
(2chr.pdb) 10 times with 10% free, 50 macrocycles
(also scripted in multirefine.csh)
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1