Re: [ccp4bb] Does ncs bias R-free? And if so, can it be avoided by special selection of the free set?

Ian Tickle Wed, 12 Jun 2019 14:03:50 -0700

Hi James

Thanks, will do.


Cheers

-- Ian


On Wed, 12 Jun 2019 at 22:02, Holton, James M <[email protected]>
wrote:

> try 6nkq ?
>
> -James Holton
> MAD Scientist
>
> On 6/12/2019 11:46 AM, Ian Tickle wrote:
>
>
> Dear Jon & Randy
>
> I did a test of this using the 2FUQ data which is one of the problematic
> cases you mention where the NCS is nearly crystallographic (in this case an
> NCS 2-fold parallel to b in P212121):
>
> Transformation matrix:
>  -0.99992   0.01204   0.00354
>   0.01200   0.99989  -0.00918
>  -0.00365  -0.00914  -0.99995
>
> Eulerian rotation:          291.08   179.44   291.77
> Orthogonal translation:     72.125    0.021  100.886
>
> For the refinement I used BUSTER with its automated similarity restraint
> (autoncs) feature.  It makes no significant difference to the result
> whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags.
>
> For FREERFLAG:
>
> Starting Rwork/Rfree = 0.3002   0.3008
> Final Rwork/Rfree      = 0.2012   0.2245
>
> For SFTOOLS/RFREE/SHELL:
>
> Starting Rwork/Rfree = 0.3001   0.3014
> Final Rwork/Rfree      = 0.2012   0.2255
>
> This was after jiggling the co-ordinates and setting all B factors to the
> average.  In fact that's not necessary: to 3 d.p.s you get the same result
> just using the deposited co-ordinates & B factors:
>
> For FREERFLAG:
>
> Starting Rwork/Rfree = 0.2702   0.2674
> Final Rwork/Rfree      = 0.2007   0.2236
>
> For SFTOOLS/RFREE/SHELL:
>
> Starting Rwork/Rfree = 0.2700   0.2707
> Final Rwork/Rfree      = 0.2007   0.2240
>
> For this to work the refinement must be run until convergence, then it
> will simply refine to the same structure with no 'memory' of the starting
> structure: BUSTER seems to do a good job in this respect (it runs about 400
> iterations).
>
> This is admittedly a single example: I haven't attempted the more
> extensive tests that Jon did mainly because I don't have more examples of
> cases where the NCS is nearly crystallographic and where if there is any
> effect it would be most likely to show up.
>
> Anyway my take on this from this one example is that neither NCS
> restraints nor Rfree flag selection nor jiggling makes any difference, even
> in that worst case scenario.  I suspect it may be that Rfree is a global
> statistic that is just not sensitive enough to detect that.
>
> Cheers
>
> -- Ian
>
>
>
>
> On Wed, 5 Jun 2019 at 15:08, Randy Read <[email protected]> wrote:
>
>> Dear Ian,
>>
>> I think the missing ingredient in your argument is an assumption that may
>> be implicit in what others have written: if you have NCS in your crystal,
>> you should be restraining that NCS in your model.  If you do that, then the
>> NCS-related Fcalcs will be similar (especially in the particularly
>> problematic case where the NCS is nearly crystallographic), and if the
>> working reflections are over-fit to match the Fobs values, then the free
>> reflections that are related by the same NCS will also be overfit.  So the
>> measurement errors don't have to be correlated, just the modelling errors.
>>
>> Best wishes,
>>
>> Randy
>>
>> On 5 Jun 2019, at 13:58, Ian Tickle <[email protected]> wrote:
>>
>>
>> Hi Jon
>>
>> Sorry I didn't intend for my response to be interpreted as saying that
>> anyone has suggested directly that the measurement errors of NCS-related
>> reflection amplitudes are correlated.  In fact the opposite is almost
>> certainly true since the only obvious way in practice that errors in Fobs
>> could be correlated is via errors in the batch scale factors which would
>> introduce correlations between errors in Fobs for reflections in the same
>> or adjacent images, but that has nothing to do with NCS.  That's the
>> 'elephant in the room': no-one has suggested that reflections on the same
>> or adjacent images should not be split between the working and test sets,
>> yet that's easily the biggest contributor to CV bias with or without NCS!
>> I think taking that effect into account would be much more productive than
>> worrying about NCS, but performing the test-set sampling in shells can't
>> possibly address that, since the images obviously cut across all shells.
>>
>> The point I was making was that correlation of errors in NCS-related Fobs
>> would appear to be the inevitable _implication_ of what certainly has been
>> claimed, namely that NCS can introduce bias into CV statistics if the
>> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs
>> between the working and test sets.  Unless there's something I've missed 
>> that's
>> the only possible explanation for that claim.  This is because overfitting
>> results from fitting the model to the errors in Fobs, and the CV bias
>> arises from correlation of those errors if the NCS-related Fobs are split
>> up, thus causing the degree of overfitting to be underestimated and giving
>> a too-rosy picture of the structure quality.  Indeed you seem to be saying
>> that because the NCS-related Fobs are correlated (a patently true
>> statement), then it follows that the errors in those Fobs are also
>> correlated, or at least no more correlated than for non-NCS-related Fobs,
>> but I just don't see how that can be true.
>>
>> Rfree is not unbiased: as a measure of the agreement it is biased upwards
>> by overfitting (otherwise how could it be used to detect overfitting?), by
>> failing to fit with the uncorrelated errors in the test-set Fobs, just as
>> Rwork is biased downwards by fitting to the errors in the working-set
>> Fobs.  Overfitting becomes immediately apparent whenever you perform any
>> refinement, so the only point at which there is no overfitting is for the
>> initial model when Rwork and Rfree are equal, apart from a small
>> difference arising from random sampling of the test-set (that sampling
>> error could be reduced by performing refinements with all 20 working/test
>> sets combinations and averaging the R values).  From there on the 'gap'
>> between Rwork and Rfree is a measure of the degree of overfitting, so we
>> should really be taking some average of Rwork and Rfree as the true measure
>> of agreement (though the biases are not exactly equal and opposite so it's
>> not a simple arithmetic mean).  The goal of choosing the appropriate
>> refinement parameters, restraints and weights is to _minimise_ overfitting,
>> not eliminate it.  It is not possible to eliminate it completely: if it
>> were then Rwork and Rfree would become equal (apart from that small effect
>> from random sampling).
>>
>> I don't follow your argument about correlation of Fobs from NCS.
>> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs
>> not from the Fobs themselves, and there's no reason to believe that the
>> Fobs should be correlated with their errors.  You say "any correlation
>> between the test-set and the working-set F's due to NCS would be expected
>> to reduce R-free".  If the working and test sets are correlated by NCS that
>> would mean that Rwork is correlated with Rfree so they would be reduced
>> equally!  There are two components of the Fobs - Fcalc difference: Fcalc -
>> Ftrue (the model error) and Fobs - Ftrue (the data error).  The former is
>> completely correlated between the working and test sets (obviously since
>> it's the same model) so what you do to one you must do to the other.  The
>> latter can only be correlated by NCS if NCS has an effect on errors in the
>> Fobs, which it doesn't, or by some other effect such as errors in batch
>> scales that are unrelated to NCS.
>>
>> Overfitting is related to the data/parameter ratio so you don't observe
>> the effects of overfitting until you change the model, the parameter set or
>> the restraints.  If there were no errors there would be no overfitting and
>> no CV bias (actually there would be no need for cross-validation!).
>>
>> Of course as you say, your tests suggest that there is no CV bias from
>> NCS, in which case there's absolutely nothing to explain!
>>
>> Cheers
>>
>> -- Ian
>>
>>
>> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <
>> [email protected]> wrote:
>>
>>> Ian, statistics is not my forte, but I don't think anyone is suggesting
>>> that the measurement errors of NCS-related reflection amplitudes are
>>> correlated. In simple terms, since NCS-related F's should be correlated,
>>> the working-set reflection amplitudes could be correlated with those in the
>>> test-set, if the latter is chosen randomly, rather than in shells. Am I
>>> right in saying that R-free not just indicates over-fitting but, also, acts
>>> as an unbiased measure of the agreement between Fo and Fc? During a
>>> well-behaved refinement run, in the cycles before any over-fitting becomes
>>> apparent, the decrease in R-free value will indicate that the changes being
>>> made to the model are making it more consistent with Fo's. In these stages,
>>> any correlation between the test-set and the working-set F's due to NCS
>>> would be expected to affect the R-free (cross-validation bias), making it
>>> lower than it would be if the test set had been chosen in resolution
>>> shells? However, you are always right and, as you know, I failed to detect
>>> any such effect in my limited tests. Thanks to you and others for replying.
>>>
>>>
>>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <
>>> [email protected]> wrote:
>>>
>>>
>>> On 05/19/2019 08:21 AM, Ian Tickle wrote:
>>> ~~~
>>> >> So there you have it: what matters is that the _errors_ in the
>>> NCS-related amplitudes are uncorrelated, or at least no more correlated
>>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes
>>> themselves.
>>>
>>> Thanks, Ian!
>>>
>>> I would like to think that it is the errors in Fobs that matter (as may
>>> be the case), because then:
>>> 1. ncs would not bias R-free even if you _do_ use ncs
>>> constraints/restraints. (changes in Fcalc due to a step of refinement would
>>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is
>>> opposite at the sym-mate, what impoves the working reflection would worsen
>>> the free)
>>> 2. There would be no need to use the same free set when you refine the
>>> structure against a new dataset (as for ligand studies) since the random
>>> errors of measurement in Fobs in the two sets would be unrelated.
>>>
>>> However when I suggested that in a previous post, I was reminded that
>>> errors in Fobs account for only a small part of the difference (Fo-Fc). The
>>> remainder must be due to inability of our simple atomic models to represent
>>> the actual electron density, or its diffraction; and for a symmetric
>>> structure and a symmetric model, that difference is likely to be
>>> symmetric.  Whether that difference represents "noise" that we want to
>>> avoid fitting is another question, but it is likely that (Fo-Fc) will be
>>> correlated with sym-mates. So I settled for convincing myself that the
>>> changes in Fc brought about by refinement would be uncorrelated, and thus
>>> the _changes_ in (Fo-Fc) at each step would be uncorrelated.
>>>
>>> Below are some of the ideas I come up with in trying to think about
>>> this, and about bias in general. (Not very well organized and not the best
>>> of prose, but if one is a glutton for punishment, or just wants to see how
>>> the mind of a madman works . . .)
>>>
>>> Warning- some of this is contrary to current consensus opinion and the
>>> conclusions may be, in the words of a popular autobuilding program, partly
>>> WRONG!  In particular, the idea that coupling by the G-function does not
>>> bias R-free, but rather is the only reason that R-free works at all!
>>> - - - - - - - - - -
>>>
>>> The differences (Fo-Fc) can be divided between (1) errors in measurement
>>> of reflection intensities and (2)failure of the model to represent the
>>> true structure. The first can be considered "noise" and we would expect
>>> it to be random, with no correlation between symm mates.
>>> However most of the difference between Fc and Fobs is not due to random
>>> noise in the data, but to failures of our model to accurately represent
>>> the real thing. These differences are likely to be ncs-symmetric.
>>> Leaving aside the question of whether or not we want to fit this kind of
>>> "noise" (bringing the model closer to the real structure?), we conclude
>>> that (Fo-Fc) is likely to be correlated between ncs-mates.
>>>
>>> But for refinement against the working set to bias the contribution of
>>> sym-related free-set reflections to R-free would require that _changes_
>>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
>>> contrary they are not correlated, i.e. if a change that decreases
>>> |Fo-Fc| for a working reflection is equally likely to decrease or
>>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then
>>> it is hard to see how refinement against the working reflection would
>>> bias R-free.
>>>
>>> Under what conditins would |Fo-Fc| for symmetry related reflections be
>>> correlated? This would be the case if change in Fc correlates AND the
>>> sign of (Fo-Fc) correlates. Again, if the difference were only due to
>>> random error in Fobs, then the sign of Fo-Fc of a symmetry related
>>> reflection
>>> would be as likely to be the opposite as the same (as the original
>>> reflection) so even if changes in Fc are correlated, what improves the
>>> fit to the original reflection would be as likely to worsen the fit to
>>> its mate. But we concluded above that Fo-Fc is likely to be correlated
>>> by symmetry, since the shortcomings of our model are likely to be
>>> symmetric. So we ask if changes in Fc are correlated.
>>>
>>> So why should a structural change result in correlated changes of
>>> symm-related Fc's?
>>> The Fc is the amplitude of the best-fit sin wave (of the specified
>>> frequency) to the projection of the density of the crystal onto the
>>> specified scattering vector. The refinement program can increase Fcalc
>>> by moving an atom so that its projection on the scattering vector moves
>>> toward a peak of that sine wave, or decrease it by moving away from a
>>> peak.
>>> If the projection of an atom on the scattering vector moves toward a
>>> peak, the density becomes more peaked and the amplitude increases, if it
>>> moves toward a trough it tends to take density away from the peak or
>>> fill in the trough and the density becomes flatter.
>>>
>>> But the scattering vector of a sym-related reflection is at a different
>>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to
>>> 180*, but then the Friedel mate is close to zero- Its a question of how
>>> parallel they are, irrespective of direction). The atom we are changing
>>> will fall at a different position along the rotated scattering vector,
>>> and its movement may be toward a peak or trough of the projected density
>>> on that scattering vector.
>>>
>>> If the two reflections are close in reciprocal space, their scattering
>>> vectors will be nearly colinear, the projection of density onto them
>>> will be similar, and the projection of the atom being moved onto them
>>> will come at a similar position in these projections. In that case
>>> moving density so that its projection on one scattering vector moves
>>> toward or away from a peak of its best-fit sine wave will have a similar
>>> effect for the adjacent reflection, and their changes will be correlated.
>>>
>>> But if the reflections are not close in reciprocal space, their
>>> scattering vectors are at different angles, the projection of the
>>> density on them looks quite different, and the projection of the atom
>>> being moved comes at a different position. In this case it is impossible
>>> to predict how changes in the two reflections' amplitudes due to
>>> movement of an atom will correlate without knowing the details of the
>>> density.
>>>
>>> For symmetry-related reflections, the projection of density of the
>>> rotated protomer on the scattering vector of the rotated reflection will
>>> be the same as the projection of the density of the original protomer on
>>> the original reflection (hence the correlation of Fc). (in case the
>>> symmetry is actually crystallographic, as in our case, then the
>>> projection of the entire crystal on the rotated scattering vector will
>>> be the same as its projection on the original reflection's scattering
>>> vector). But the change we are making is only in the original protomer,
>>> not in its symm mate, and so its projection will fall at a different
>>> point along the rotated scattering vector, so whether it moves density
>>> toward a peak or trough is somewhat random.
>>>
>>> If ncs is restrained or constrained, the changes will
>>> also follow ncs-symmetry and so changes in Fc would be expected to be
>>> symmetric.
>>>
>>> I have extensive experiments, again with the same 2CHR structure
>>> refining with I4 symmetry, showing that when you introduce a change in
>>> the structure by random shaking or molecular dynamics, the correlation
>>> between changes in Fc for "ncs" symmetry related atoms is close to zero,
>>> and occasionally negative. The slight positive average correlation may be
>>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30
>>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
>>> not by ncs but by the G-function. Granted changes due to shaking might
>>> not be the same as changes due to refinement, but these were shaken
>>> starting from the refined position, and I assume that if they were
>>> refined
>>> from this randomly shaken position they would go back to the original
>>> refined position, in which case the Fc changes due to refinement would
>>> be equally uncorrelated.
>>>
>>> ----------
>>>
>>> Coupling between reflections by the G function-
>>> Without saying exactly what is meant by couplings, reflections can be
>>> coupled in two ways. One, reflections are coupled to other reflections
>>> near
>>> them in reciprocal space. This is due to the fact that the molecular
>>> transform of the molecule is relatively smooth (due to the molecular
>>> transform being oversampled due to the asymmetric unit being larger than
>>> the structure contained?), so values of amplitude and
>>> phase for a reflection cannot differ too widely from those of neighboring
>>> reflections. Or because the scattering vectors of neighboring
>>> reflections are nearly parallel and similar in frequency so the projection
>>> of the density on them integrates similarly.
>>> (second is ncs-coupling)
>>>
>>> In general coupling of neighboring reflns is a good thing for
>>> crystallography. No one reflection is indispensable, because its
>>> information is much the same as the other reflections in a cube of 26
>>> surrounding reflections. This allows us to solve structures when the data
>>> is only 80-90% complete, provided the missing reflections are randomly
>>> scattered among the present reflections. It supports the "fill-in" fft map
>>> procedure where FcΦc is used for missing reflections (the structure based
>>> on surrounding reflectins will be good enough to give a good estimate of
>>> the missing structure factor). It makes possible resolution extension
>>> during density modification or by the "free lunch" procedures of Dodson and
>>> Sheldrick .
>>>
>>> And I would argue that this coupling is what makes cross-validation
>>> (free-R) work. We say
>>> that refining against the working reflections improves the structure,
>>> making it more like the true structure, and thus the free Fc approach their
>>> Fobs. But not because the good fairy looks at the structure and says "OK,
>>> Its improved now, we can lower the R-free".
>>> How does it work mathematically? If the reflections were completely
>>> independent, if free and working reflections were not coupled through being
>>> samples of the same molecular transform, then changes which improve the fit
>>> to the working reflections would have no effect on the values of the free
>>> reflections.  It has to go through the structure, changes due to refining
>>> against the working reflections affect the free reflections, which we can
>>> call "coupling", and we know that is described by the G-function. If free
>>> reflections were not coupled to working reflections, Rfree would never
>>> change and thus would be useless.
>>>
>>> For an example, suppose we refine the position of an atom, choosing
>>> working reflections only in the plane l=0, and free reflections along the l
>>> axis (assuming an orthorhombic system). The working reflections are only
>>> sensitive to position in the x and y directions, so the z position would be
>>> unchanged by the refinement. But the free reflections are only sensitive to
>>> position along the z axis, so R-free would be unchanged. Presumably the
>>> structure would be improved (if that one atom was slightly misplaced and
>>> all other atoms correctly placed), but the Rfee would not improve. I would
>>> say this is the direction Chapman and co. were heading with their thin
>>> shells of free reflections isolated by thick shells of unused guard
>>> reflections. If they really succeed in eliminating the "bias", then Rfree
>>> will be unresponsive to refinement and so useless.
>>>
>>> Al. et Chapman considered two kinds of coupling- that due to ncs and
>>> direct coupling via Rossmann's G function. They found that choosing free
>>> set
>>> in thin shells had little effect, in fact very thick shells with the
>>> test reflections centered in the middle of the shell were required to
>>> significantly reduce the "bias". Now the reciprocal space equivalent of
>>> ncs operators are pure rotational operators, so they relate points in
>>> reciprocal space with precisely the same resolution. Selecting free
>>> reflections in thin shells should thus be sufficient to ensure that
>>> ncs-related reflections have the same free-R flag and avoid bias.  For
>>> my case where ncs is really crystallographic, the shells could be
>>> infinitely thin since the symm-related reflections have precisely the
>>> same resolution. For real ncs the operator takes a reflection to a
>>> non-bragg position which is closely surrounded by reflections, coupled
>>> to them by the G function.
>>> In that case somewhat thicker shells would be required. But using very
>>> thick guard zones around the free reflections implies it is the
>>> G-function they are fighting, as they somewhat implicitly acknowledged
>>> by the
>>> discussion of thickness of shells in terms of the radius of the central
>>> maximum
>>> of the G function. In that case I wonder if ncs-coupling which still has
>>> to go through G-function coupling to bias a free reflection
>>> contributes significantly compared to the coupling of every reflection to
>>> its direct neighbors.
>>>
>>> By using thick guard zones of unused reflections, they end up refining
>>> with very incomplete data which would be expected to affect the refinement
>>> and raise the R-free just because the structure is less correct. They
>>> control for this by refining with another set in which the same number of
>>> reflections are deleted randomly. But this is not a satisfactory control,
>>> because it is generally agreed that missing reflections due to an empty
>>> zone in reciprocal space is more deleterious than missing reflections that
>>> are randomly scattered.
>>> Ironically this same "redundancy due to oversampling" that Chapman and
>>> co. discuss in their introduction allows neighboring reflections to impart
>>> most of the information of an isolated absent reflection. When the missing
>>> reflections are clustered together in a thick shell or wedge, a lot of
>>> information is not available and the structure will suffer. And in
>>> particular the structural details that determine structure factors in the
>>> center of the excluded zone will be poorly determined, since information
>>> pertaining to them is being excluded. So of course the R-factor calculated
>>> from these reflections will be higher than with randomly absent data.
>>> Furthermore, if G-function is the vehicle by which R-free follows R, R-free
>>> will follow less closely and hence under-report what improvement is being
>>> made.
>>>
>>>
>>>
>>>
>>>
>>>
>>> >
>>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> >
>>> >    Revisiting (and testing) an old question:
>>> >
>>> >    On 08/12/2003 02:38 PM, [email protected] <mailto:
>>> [email protected]> wrote:
>>> >      > ***  For details on how to be removed from this list visit the
>>> ***
>>> >      > ***          CCP4 home page http://www.ccp4.ac.uk <
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
>>>       ***
>>> >
>>> >      > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
>>> >      >>
>>> >      >> (1) you only need to take special care for choosing a test set
>>> if you _apply_
>>> >      >> the NCS in your refinement, either as restraints or as
>>> constraints. If you
>>> >      >> refine your NCS protomers without any NCS
>>> restraints/constraints, both your
>>> >      >> protomers and your reflections will be independent, and thus
>>> no special care
>>> >      >> for choosing a test set has to be taken
>>> >      >
>>> >      > If your space group is P6 with only one molecule in the
>>> asymmetric unit but you instead choose the subgroup P3 in which to refine
>>> it, and you now have two molecules per asymmetric unit related by "local"
>>> symmetry to one another, but you don't apply it, does that mean that
>>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3
>>> unless you apply the "NCS"?
>>> >
>>> >    ===================================================
>>> >    The experiment described below  seems to show that Dirk's initial
>>> >    statement was correct: even in the case where the "ncs" is actually
>>> >    crystallographic, and the free set is chosen randomly, R-free is not
>>> >    affected by how you pick the free set.  A structure is refined with
>>> >    artificially low symmetry, so that a 2-fold crystallographic
>>> operator
>>> >    becomes "NCS". Free reflections are picked either randomly (in which
>>> >    case the great majority of free reflections are related by the NCS
>>> to
>>> >    working reflections), or taking the lattice symmetry into account so
>>> >    that symm-related pairs are either both free or both working. The
>>> final
>>> >    R-factors are not significantly different, even with repeating each
>>> mode
>>> >    10 times with independently selected free sets. They are also not
>>> >    significantly different from the values obtained refining in the
>>> correct
>>> >    space group, where there is no ncs.
>>> >
>>> >    Maybe this is not really surprising. Since symmetry-related
>>> reflections
>>> >    have the same resolution, picking free reflections this way is one
>>> way
>>> >    of picking them in (very) thin shells, and this has been reported
>>> not to
>>> >    avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol
>>> 4,
>>> >    897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238).
>>> And see:
>>> >    
>>> > http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
>>> <
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=
>>> >
>>> >
>>> >    But this is more significant: in cases of lattice symmetry like
>>> this,
>>> >    the ncs takes working reflections directly onto free reflections.
>>> In the
>>> >    case of true ncs the operator takes the reflection to a point
>>> between
>>> >    neighboring reflections, which are closely coupled to that point by
>>> the
>>> >    Rossmann G function. Some of these neighbors are outside the thin
>>> shell
>>> >    (if the original reflection was inside; or vice versa), and thus
>>> defeat
>>> >    the thin-shells strategy.  In our case the symm-related free
>>> reflection
>>> >    is directly coupled to the working reflection by the ncs operator,
>>> and
>>> >    its neighbors are no closer than the neighbors of the original
>>> >    reflection, so if there is bias due to NCS it should be principally
>>> >    through the sym-related reflection and not through its neighbors.
>>> And so
>>> >    most of the bias should be eliminated by picking the free set in
>>> thin
>>> >    shells or by lattice symmetry.
>>> >
>>> >    Also, since the "ncs" is really crystallographic, we have the
>>> control of
>>> >    refining in the correct space group where there is no ncs. The
>>> R-factors
>>> >    were not significantly different when the structure was refined in
>>> the
>>> >    correct space group. (Although it could be argued that that leads
>>> to a
>>> >    better structure, and the only reason the R-factors were the same is
>>> >    that bias in the lower symmetry refinement resulted in lowering
>>> Rfree
>>> >    to the same level.)
>>> >
>>> >    Just one example, but it is the first I tried- no cherry-picking. I
>>> >    would be interested to know if anyone has an example where taking
>>> >    lattice symmetry into account did make a difference.
>>> >
>>> >    For me the lack of effect is most simply explained by saying that,
>>> while
>>> >    of course ncs-related reflections are correlated in their Fo's and
>>> Fc's,
>>> >    and perhaps in in their |Fo-Fc|'s, I see no reason to expect that
>>> the
>>> >    _changes_ in |Fo-Fc| produced by a step of refinement will be
>>> correlated
>>> >    (I can expound on this). Therefore whatever refinement is doing to
>>> >    improve the fit to working reflections is equally likely to improve
>>> or
>>> >    worsen the fit to sym-related free reflections. In that case it is
>>> hard
>>> >    to see how refinement against working reflections could bias their
>>> >    symm-related free reflections.  (Then how does R-free work? Why does
>>> >    R-free come down at all when you refine? Because of coupling to
>>> >    neighboring working reflections by the G-function?)
>>> >
>>> >    Summary of results (details below):
>>> >    0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
>>> >        R: 0.189          Rfree: 0.264  Nfree:442(5%)  Nrefl: 9087
>>> >
>>> >    1. The deposited 2chr (I422) was refined in that space group with
>>> the
>>> >    original free set. No Sigma cutoff, 10 macrocycles.
>>> >        R: 0.1767        Rfree: 0.2403  Nfree:442(5%)  Nrefl: 9087
>>> >
>>> >    2. The deposited structure was refined in I422 10 times, 50
>>> macrocycles
>>> >    each, with randomly picked 10% free reflections
>>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062  Nfree: 908.9±  Nrefl:
>>> 9087
>>> >
>>> >    3. The structure was expanded to an I4 dimer related by the unused
>>> I422
>>> >    crystallographic operator, matching the dimer of 1chr. This dimer
>>> was
>>> >    refined against the original (I4) data of 1chr, picking free
>>> reflections
>>> >    in symmetry related pairs. This was repeated 10 times with different
>>> >    random seed for picking reflections.
>>> >    R: 0.1666±0.0012  **Rfree:0.2523±0.0077  Nfree: 1601.4  Nrefl:16011
>>> >
>>> >    4. same as 3 but picking free reflections randomly without regard
>>> for
>>> >    lattice symmetry.
>>> >    On average 15 free reflections were in pairs, 212 were invariant
>>> under
>>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
>>> >    reflections.
>>> >    R: 0.1674±0.0017  **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011
>>> >
>>> >    (**-Average Rfree almost identical by coincidence- the individual
>>> >    results were all different)
>>> >
>>> >    Detailed results from the individual refinement runs are available
>>> in
>>> >    spreadsheet in dropbox:
>>> >    https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=
>>> >
>>> >    Scripts used in running the tests are also there in NCSbias.tgz:
>>> >    https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=
>>> >
>>> >
>>> >    ========================================
>>> >
>>> >    Methods:
>>> >    I would like an experiment where relatively complete data is
>>> available
>>> >    in the lower symmetry. To get something that is available to
>>> everyone, I
>>> >    choose from the PDB. A good example is 2CHR, in space group I422,
>>> which
>>> >    was originally solved and the data deposited in I4 with two
>>> molecules in
>>> >    the asymmetric unit(structure 1CHR).
>>> >
>>> >    2CHR statistics from the PDB:
>>> >              R      R-free  complete  (Refined 8.0 to 3.0 A
>>> >              0.189  0.264  81.4      reported in PDB, with 2-Sig
>>> cutoff)
>>> >                                          Nfree=442  (4.86%)
>>> >    Further refinement in phenix with same free set, no sigma cutoff:
>>> >        10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix
>>> default
>>> >        Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087
>>> Nfree=442(4.86%)
>>> >        Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles =
>>> 1.428
>>> >        Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles =
>>> 1.284
>>> >        (2chr_orig_001.pdb,
>>> >
>>> >    The number of free reflections is small, so the uncertainty
>>> >    in Rfree is large (a good case for Rcomplete)
>>> >    Instead for better statistics, use new 10% free set and repeat 10
>>> times;
>>> >    50 macrocycles, with different random seeds:
>>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
>>> >        Nfree: 908.9±0.32  Nrefl: 9087
>>> >
>>> >    For artificially low symmetry, expand the I422 structure (making
>>> what I
>>> >    call 3chr for convenience although I'm sure that ID has been taken):
>>> >
>>> >    pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
>>> >    exclude header
>>> >    spacegroup I4
>>> >    cell 111.890  111.890  148.490  90.00  90.00  90.00
>>> >    symgen  X,Y,Z
>>> >    symgen X,1-Y,1-Z
>>> >    CHAIN SYMMETRY 2 A B
>>> >    eof
>>> >
>>> >    Get the structure factors from 1CHR: 1chr-sf.cif
>>> >    Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
>>> >    This file has no free set (deposited 1993) so tell phenix to
>>> generate
>>> >    one. I don't want phenix to protect me from my own stupidity, so I
>>> use:
>>> >              generate = True
>>> >              use_lattice_symmetry = False
>>> >              use_dataman_shells = False
>>> >          (the .eff file with all non-default parameters is available as
>>> >    3chr_rand_001.eff in the .tgz mentioned above)
>>> >
>>> >    For more significance, use the script multirefine.csh to repeat the
>>> refinement 10 times with different random seed.After each run, grep
>>> significant results into a log file.
>>> >
>>> >
>>> >    To check this gives free reflections related to working
>>> reflections, I
>>> >    used mtz2various and a fortran prog (sortfree.f in .tgz) to
>>> separate the
>>> >    data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
>>> >    (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
>>> >
>>> >    mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
>>> >        LABIN FP=F-obs DUM1=R-free-flags
>>> >        OUTPUT USER '(3I4,2F10.5)'
>>> >    eof
>>> >    sortfree <<eof >sort3.hkl
>>> >
>>> >    sort3.hkl  looks like:
>>> >                        ______h>k______    ______h<k______
>>> >          h  k  l      F        free    F*        free*
>>> >          1  2  3    208.97      0.00    174.95      0.00
>>> >          1  2  5    226.85      0.00    191.65      0.00
>>> >          1  2  7    144.85      0.00    164.86      0.00
>>> >          1  2  9    251.26      0.00    261.71      0.00
>>> >          1  2  11    333.84      0.00    335.18      0.00
>>> >          1  2  13    800.37      0.00    791.77      0.00
>>> >          1  2  15    412.92      0.00    409.90      0.00
>>> >          1  2  17    306.99      0.00    317.53      0.00
>>> >          1  2  19    225.54      0.00    220.91      0.00
>>> >          1  2  21    101.20      1.00*  104.84      0.00
>>> >          1  2  23    156.27      0.00    156.49      0.00
>>> >          1  2  25    202.97      0.00    202.23      0.00
>>> >          1  2  27    216.10      0.00    219.28      0.00
>>> >          1  2  29    106.76      0.00    100.93      0.00
>>> >          1  2  31    157.32      0.00    154.37      1.00*
>>> >          1  2  33    71.84      0.00    20.78      0.00
>>> >          1  2  35    179.05      0.00    165.67      0.00
>>> >          1  2  37    254.04      0.00    239.96      1.00*
>>> >          1  2  39    69.56      0.00    30.61      0.00
>>> >          1  2  41    56.20      0.00    51.02      0.00
>>> >
>>> >    , and awked for 1 in the free columns. Out of 6922 pairs of
>>> reflections,
>>> >    in one case:
>>> >    674 in the first asu (h>k) are in the free set,
>>> >    703 in the second asu (h<k) are in the free set
>>> >    only 11 pairs have the reflections in both asu free.
>>> >
>>> >    out of 16011 refl in I4,
>>> >    6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with
>>> absent mate.
>>> >
>>> >    out of 1601 free reflections:
>>> >    On average 15 free reflections were in pairs, 212 were invariant
>>> under
>>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
>>> >    reflections.
>>> >
>>> >    Then do 10 more runs of 50 macrocycles with:
>>> >          use_lattice_symmetry = False
>>> >          collecting the same statistics
>>> >    (also scripted in multirefine.csh)
>>> >
>>> >    Finally, use ref2chr.eff to refine (as previously mentined) a
>>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
>>> >    (also scripted in multirefine.csh)
>>> >
>>> >
>>> ########################################################################
>>> >
>>> >    To unsubscribe from the CCP4BB list, click the following link:
>>> >    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=
>>> >
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> >
>>> > To unsubscribe from the CCP4BB list, click the following link:
>>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>>>
>>>
>>> >
>>>
>>> ########################################################################
>>>
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>>
>>> ------------------------------
>>>
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>>
>>
>> ------------------------------
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>>
>> ------
>> Randy J. Read
>> Department of Haematology, University of Cambridge
>> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
>> The Keith Peters Building                               Fax: + 44 1223
>> 336827
>> Hills Road                                                       E-mail:
>> [email protected] <[email protected]>
>> Cambridge CB2 0XY, U.K.
>> www-structmed.cimr.cam.ac.uk
>>
>>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>
>
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] Does ncs bias R-free? And if so, can it be avoided by special selection of the free set?

Reply via email to