Hi James Thanks, will do.
Cheers -- Ian On Wed, 12 Jun 2019 at 22:02, Holton, James M <[email protected]> wrote: > try 6nkq ? > > -James Holton > MAD Scientist > > On 6/12/2019 11:46 AM, Ian Tickle wrote: > > > Dear Jon & Randy > > I did a test of this using the 2FUQ data which is one of the problematic > cases you mention where the NCS is nearly crystallographic (in this case an > NCS 2-fold parallel to b in P212121): > > Transformation matrix: > -0.99992 0.01204 0.00354 > 0.01200 0.99989 -0.00918 > -0.00365 -0.00914 -0.99995 > > Eulerian rotation: 291.08 179.44 291.77 > Orthogonal translation: 72.125 0.021 100.886 > > For the refinement I used BUSTER with its automated similarity restraint > (autoncs) feature. It makes no significant difference to the result > whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags. > > For FREERFLAG: > > Starting Rwork/Rfree = 0.3002 0.3008 > Final Rwork/Rfree = 0.2012 0.2245 > > For SFTOOLS/RFREE/SHELL: > > Starting Rwork/Rfree = 0.3001 0.3014 > Final Rwork/Rfree = 0.2012 0.2255 > > This was after jiggling the co-ordinates and setting all B factors to the > average. In fact that's not necessary: to 3 d.p.s you get the same result > just using the deposited co-ordinates & B factors: > > For FREERFLAG: > > Starting Rwork/Rfree = 0.2702 0.2674 > Final Rwork/Rfree = 0.2007 0.2236 > > For SFTOOLS/RFREE/SHELL: > > Starting Rwork/Rfree = 0.2700 0.2707 > Final Rwork/Rfree = 0.2007 0.2240 > > For this to work the refinement must be run until convergence, then it > will simply refine to the same structure with no 'memory' of the starting > structure: BUSTER seems to do a good job in this respect (it runs about 400 > iterations). > > This is admittedly a single example: I haven't attempted the more > extensive tests that Jon did mainly because I don't have more examples of > cases where the NCS is nearly crystallographic and where if there is any > effect it would be most likely to show up. > > Anyway my take on this from this one example is that neither NCS > restraints nor Rfree flag selection nor jiggling makes any difference, even > in that worst case scenario. I suspect it may be that Rfree is a global > statistic that is just not sensitive enough to detect that. > > Cheers > > -- Ian > > > > > On Wed, 5 Jun 2019 at 15:08, Randy Read <[email protected]> wrote: > >> Dear Ian, >> >> I think the missing ingredient in your argument is an assumption that may >> be implicit in what others have written: if you have NCS in your crystal, >> you should be restraining that NCS in your model. If you do that, then the >> NCS-related Fcalcs will be similar (especially in the particularly >> problematic case where the NCS is nearly crystallographic), and if the >> working reflections are over-fit to match the Fobs values, then the free >> reflections that are related by the same NCS will also be overfit. So the >> measurement errors don't have to be correlated, just the modelling errors. >> >> Best wishes, >> >> Randy >> >> On 5 Jun 2019, at 13:58, Ian Tickle <[email protected]> wrote: >> >> >> Hi Jon >> >> Sorry I didn't intend for my response to be interpreted as saying that >> anyone has suggested directly that the measurement errors of NCS-related >> reflection amplitudes are correlated. In fact the opposite is almost >> certainly true since the only obvious way in practice that errors in Fobs >> could be correlated is via errors in the batch scale factors which would >> introduce correlations between errors in Fobs for reflections in the same >> or adjacent images, but that has nothing to do with NCS. That's the >> 'elephant in the room': no-one has suggested that reflections on the same >> or adjacent images should not be split between the working and test sets, >> yet that's easily the biggest contributor to CV bias with or without NCS! >> I think taking that effect into account would be much more productive than >> worrying about NCS, but performing the test-set sampling in shells can't >> possibly address that, since the images obviously cut across all shells. >> >> The point I was making was that correlation of errors in NCS-related Fobs >> would appear to be the inevitable _implication_ of what certainly has been >> claimed, namely that NCS can introduce bias into CV statistics if the >> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs >> between the working and test sets. Unless there's something I've missed >> that's >> the only possible explanation for that claim. This is because overfitting >> results from fitting the model to the errors in Fobs, and the CV bias >> arises from correlation of those errors if the NCS-related Fobs are split >> up, thus causing the degree of overfitting to be underestimated and giving >> a too-rosy picture of the structure quality. Indeed you seem to be saying >> that because the NCS-related Fobs are correlated (a patently true >> statement), then it follows that the errors in those Fobs are also >> correlated, or at least no more correlated than for non-NCS-related Fobs, >> but I just don't see how that can be true. >> >> Rfree is not unbiased: as a measure of the agreement it is biased upwards >> by overfitting (otherwise how could it be used to detect overfitting?), by >> failing to fit with the uncorrelated errors in the test-set Fobs, just as >> Rwork is biased downwards by fitting to the errors in the working-set >> Fobs. Overfitting becomes immediately apparent whenever you perform any >> refinement, so the only point at which there is no overfitting is for the >> initial model when Rwork and Rfree are equal, apart from a small >> difference arising from random sampling of the test-set (that sampling >> error could be reduced by performing refinements with all 20 working/test >> sets combinations and averaging the R values). From there on the 'gap' >> between Rwork and Rfree is a measure of the degree of overfitting, so we >> should really be taking some average of Rwork and Rfree as the true measure >> of agreement (though the biases are not exactly equal and opposite so it's >> not a simple arithmetic mean). The goal of choosing the appropriate >> refinement parameters, restraints and weights is to _minimise_ overfitting, >> not eliminate it. It is not possible to eliminate it completely: if it >> were then Rwork and Rfree would become equal (apart from that small effect >> from random sampling). >> >> I don't follow your argument about correlation of Fobs from NCS. >> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs >> not from the Fobs themselves, and there's no reason to believe that the >> Fobs should be correlated with their errors. You say "any correlation >> between the test-set and the working-set F's due to NCS would be expected >> to reduce R-free". If the working and test sets are correlated by NCS that >> would mean that Rwork is correlated with Rfree so they would be reduced >> equally! There are two components of the Fobs - Fcalc difference: Fcalc - >> Ftrue (the model error) and Fobs - Ftrue (the data error). The former is >> completely correlated between the working and test sets (obviously since >> it's the same model) so what you do to one you must do to the other. The >> latter can only be correlated by NCS if NCS has an effect on errors in the >> Fobs, which it doesn't, or by some other effect such as errors in batch >> scales that are unrelated to NCS. >> >> Overfitting is related to the data/parameter ratio so you don't observe >> the effects of overfitting until you change the model, the parameter set or >> the restraints. If there were no errors there would be no overfitting and >> no CV bias (actually there would be no need for cross-validation!). >> >> Of course as you say, your tests suggest that there is no CV bias from >> NCS, in which case there's absolutely nothing to explain! >> >> Cheers >> >> -- Ian >> >> >> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper < >> [email protected]> wrote: >> >>> Ian, statistics is not my forte, but I don't think anyone is suggesting >>> that the measurement errors of NCS-related reflection amplitudes are >>> correlated. In simple terms, since NCS-related F's should be correlated, >>> the working-set reflection amplitudes could be correlated with those in the >>> test-set, if the latter is chosen randomly, rather than in shells. Am I >>> right in saying that R-free not just indicates over-fitting but, also, acts >>> as an unbiased measure of the agreement between Fo and Fc? During a >>> well-behaved refinement run, in the cycles before any over-fitting becomes >>> apparent, the decrease in R-free value will indicate that the changes being >>> made to the model are making it more consistent with Fo's. In these stages, >>> any correlation between the test-set and the working-set F's due to NCS >>> would be expected to affect the R-free (cross-validation bias), making it >>> lower than it would be if the test set had been chosen in resolution >>> shells? However, you are always right and, as you know, I failed to detect >>> any such effect in my limited tests. Thanks to you and others for replying. >>> >>> >>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry < >>> [email protected]> wrote: >>> >>> >>> On 05/19/2019 08:21 AM, Ian Tickle wrote: >>> ~~~ >>> >> So there you have it: what matters is that the _errors_ in the >>> NCS-related amplitudes are uncorrelated, or at least no more correlated >>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes >>> themselves. >>> >>> Thanks, Ian! >>> >>> I would like to think that it is the errors in Fobs that matter (as may >>> be the case), because then: >>> 1. ncs would not bias R-free even if you _do_ use ncs >>> constraints/restraints. (changes in Fcalc due to a step of refinement would >>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is >>> opposite at the sym-mate, what impoves the working reflection would worsen >>> the free) >>> 2. There would be no need to use the same free set when you refine the >>> structure against a new dataset (as for ligand studies) since the random >>> errors of measurement in Fobs in the two sets would be unrelated. >>> >>> However when I suggested that in a previous post, I was reminded that >>> errors in Fobs account for only a small part of the difference (Fo-Fc). The >>> remainder must be due to inability of our simple atomic models to represent >>> the actual electron density, or its diffraction; and for a symmetric >>> structure and a symmetric model, that difference is likely to be >>> symmetric. Whether that difference represents "noise" that we want to >>> avoid fitting is another question, but it is likely that (Fo-Fc) will be >>> correlated with sym-mates. So I settled for convincing myself that the >>> changes in Fc brought about by refinement would be uncorrelated, and thus >>> the _changes_ in (Fo-Fc) at each step would be uncorrelated. >>> >>> Below are some of the ideas I come up with in trying to think about >>> this, and about bias in general. (Not very well organized and not the best >>> of prose, but if one is a glutton for punishment, or just wants to see how >>> the mind of a madman works . . .) >>> >>> Warning- some of this is contrary to current consensus opinion and the >>> conclusions may be, in the words of a popular autobuilding program, partly >>> WRONG! In particular, the idea that coupling by the G-function does not >>> bias R-free, but rather is the only reason that R-free works at all! >>> - - - - - - - - - - >>> >>> The differences (Fo-Fc) can be divided between (1) errors in measurement >>> of reflection intensities and (2)failure of the model to represent the >>> true structure. The first can be considered "noise" and we would expect >>> it to be random, with no correlation between symm mates. >>> However most of the difference between Fc and Fobs is not due to random >>> noise in the data, but to failures of our model to accurately represent >>> the real thing. These differences are likely to be ncs-symmetric. >>> Leaving aside the question of whether or not we want to fit this kind of >>> "noise" (bringing the model closer to the real structure?), we conclude >>> that (Fo-Fc) is likely to be correlated between ncs-mates. >>> >>> But for refinement against the working set to bias the contribution of >>> sym-related free-set reflections to R-free would require that _changes_ >>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the >>> contrary they are not correlated, i.e. if a change that decreases >>> |Fo-Fc| for a working reflection is equally likely to decrease or >>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then >>> it is hard to see how refinement against the working reflection would >>> bias R-free. >>> >>> Under what conditins would |Fo-Fc| for symmetry related reflections be >>> correlated? This would be the case if change in Fc correlates AND the >>> sign of (Fo-Fc) correlates. Again, if the difference were only due to >>> random error in Fobs, then the sign of Fo-Fc of a symmetry related >>> reflection >>> would be as likely to be the opposite as the same (as the original >>> reflection) so even if changes in Fc are correlated, what improves the >>> fit to the original reflection would be as likely to worsen the fit to >>> its mate. But we concluded above that Fo-Fc is likely to be correlated >>> by symmetry, since the shortcomings of our model are likely to be >>> symmetric. So we ask if changes in Fc are correlated. >>> >>> So why should a structural change result in correlated changes of >>> symm-related Fc's? >>> The Fc is the amplitude of the best-fit sin wave (of the specified >>> frequency) to the projection of the density of the crystal onto the >>> specified scattering vector. The refinement program can increase Fcalc >>> by moving an atom so that its projection on the scattering vector moves >>> toward a peak of that sine wave, or decrease it by moving away from a >>> peak. >>> If the projection of an atom on the scattering vector moves toward a >>> peak, the density becomes more peaked and the amplitude increases, if it >>> moves toward a trough it tends to take density away from the peak or >>> fill in the trough and the density becomes flatter. >>> >>> But the scattering vector of a sym-related reflection is at a different >>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to >>> 180*, but then the Friedel mate is close to zero- Its a question of how >>> parallel they are, irrespective of direction). The atom we are changing >>> will fall at a different position along the rotated scattering vector, >>> and its movement may be toward a peak or trough of the projected density >>> on that scattering vector. >>> >>> If the two reflections are close in reciprocal space, their scattering >>> vectors will be nearly colinear, the projection of density onto them >>> will be similar, and the projection of the atom being moved onto them >>> will come at a similar position in these projections. In that case >>> moving density so that its projection on one scattering vector moves >>> toward or away from a peak of its best-fit sine wave will have a similar >>> effect for the adjacent reflection, and their changes will be correlated. >>> >>> But if the reflections are not close in reciprocal space, their >>> scattering vectors are at different angles, the projection of the >>> density on them looks quite different, and the projection of the atom >>> being moved comes at a different position. In this case it is impossible >>> to predict how changes in the two reflections' amplitudes due to >>> movement of an atom will correlate without knowing the details of the >>> density. >>> >>> For symmetry-related reflections, the projection of density of the >>> rotated protomer on the scattering vector of the rotated reflection will >>> be the same as the projection of the density of the original protomer on >>> the original reflection (hence the correlation of Fc). (in case the >>> symmetry is actually crystallographic, as in our case, then the >>> projection of the entire crystal on the rotated scattering vector will >>> be the same as its projection on the original reflection's scattering >>> vector). But the change we are making is only in the original protomer, >>> not in its symm mate, and so its projection will fall at a different >>> point along the rotated scattering vector, so whether it moves density >>> toward a peak or trough is somewhat random. >>> >>> If ncs is restrained or constrained, the changes will >>> also follow ncs-symmetry and so changes in Fc would be expected to be >>> symmetric. >>> >>> I have extensive experiments, again with the same 2CHR structure >>> refining with I4 symmetry, showing that when you introduce a change in >>> the structure by random shaking or molecular dynamics, the correlation >>> between changes in Fc for "ncs" symmetry related atoms is close to zero, >>> and occasionally negative. The slight positive average correlation may be >>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30 >>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled >>> not by ncs but by the G-function. Granted changes due to shaking might >>> not be the same as changes due to refinement, but these were shaken >>> starting from the refined position, and I assume that if they were >>> refined >>> from this randomly shaken position they would go back to the original >>> refined position, in which case the Fc changes due to refinement would >>> be equally uncorrelated. >>> >>> ---------- >>> >>> Coupling between reflections by the G function- >>> Without saying exactly what is meant by couplings, reflections can be >>> coupled in two ways. One, reflections are coupled to other reflections >>> near >>> them in reciprocal space. This is due to the fact that the molecular >>> transform of the molecule is relatively smooth (due to the molecular >>> transform being oversampled due to the asymmetric unit being larger than >>> the structure contained?), so values of amplitude and >>> phase for a reflection cannot differ too widely from those of neighboring >>> reflections. Or because the scattering vectors of neighboring >>> reflections are nearly parallel and similar in frequency so the projection >>> of the density on them integrates similarly. >>> (second is ncs-coupling) >>> >>> In general coupling of neighboring reflns is a good thing for >>> crystallography. No one reflection is indispensable, because its >>> information is much the same as the other reflections in a cube of 26 >>> surrounding reflections. This allows us to solve structures when the data >>> is only 80-90% complete, provided the missing reflections are randomly >>> scattered among the present reflections. It supports the "fill-in" fft map >>> procedure where FcΦc is used for missing reflections (the structure based >>> on surrounding reflectins will be good enough to give a good estimate of >>> the missing structure factor). It makes possible resolution extension >>> during density modification or by the "free lunch" procedures of Dodson and >>> Sheldrick . >>> >>> And I would argue that this coupling is what makes cross-validation >>> (free-R) work. We say >>> that refining against the working reflections improves the structure, >>> making it more like the true structure, and thus the free Fc approach their >>> Fobs. But not because the good fairy looks at the structure and says "OK, >>> Its improved now, we can lower the R-free". >>> How does it work mathematically? If the reflections were completely >>> independent, if free and working reflections were not coupled through being >>> samples of the same molecular transform, then changes which improve the fit >>> to the working reflections would have no effect on the values of the free >>> reflections. It has to go through the structure, changes due to refining >>> against the working reflections affect the free reflections, which we can >>> call "coupling", and we know that is described by the G-function. If free >>> reflections were not coupled to working reflections, Rfree would never >>> change and thus would be useless. >>> >>> For an example, suppose we refine the position of an atom, choosing >>> working reflections only in the plane l=0, and free reflections along the l >>> axis (assuming an orthorhombic system). The working reflections are only >>> sensitive to position in the x and y directions, so the z position would be >>> unchanged by the refinement. But the free reflections are only sensitive to >>> position along the z axis, so R-free would be unchanged. Presumably the >>> structure would be improved (if that one atom was slightly misplaced and >>> all other atoms correctly placed), but the Rfee would not improve. I would >>> say this is the direction Chapman and co. were heading with their thin >>> shells of free reflections isolated by thick shells of unused guard >>> reflections. If they really succeed in eliminating the "bias", then Rfree >>> will be unresponsive to refinement and so useless. >>> >>> Al. et Chapman considered two kinds of coupling- that due to ncs and >>> direct coupling via Rossmann's G function. They found that choosing free >>> set >>> in thin shells had little effect, in fact very thick shells with the >>> test reflections centered in the middle of the shell were required to >>> significantly reduce the "bias". Now the reciprocal space equivalent of >>> ncs operators are pure rotational operators, so they relate points in >>> reciprocal space with precisely the same resolution. Selecting free >>> reflections in thin shells should thus be sufficient to ensure that >>> ncs-related reflections have the same free-R flag and avoid bias. For >>> my case where ncs is really crystallographic, the shells could be >>> infinitely thin since the symm-related reflections have precisely the >>> same resolution. For real ncs the operator takes a reflection to a >>> non-bragg position which is closely surrounded by reflections, coupled >>> to them by the G function. >>> In that case somewhat thicker shells would be required. But using very >>> thick guard zones around the free reflections implies it is the >>> G-function they are fighting, as they somewhat implicitly acknowledged >>> by the >>> discussion of thickness of shells in terms of the radius of the central >>> maximum >>> of the G function. In that case I wonder if ncs-coupling which still has >>> to go through G-function coupling to bias a free reflection >>> contributes significantly compared to the coupling of every reflection to >>> its direct neighbors. >>> >>> By using thick guard zones of unused reflections, they end up refining >>> with very incomplete data which would be expected to affect the refinement >>> and raise the R-free just because the structure is less correct. They >>> control for this by refining with another set in which the same number of >>> reflections are deleted randomly. But this is not a satisfactory control, >>> because it is generally agreed that missing reflections due to an empty >>> zone in reciprocal space is more deleterious than missing reflections that >>> are randomly scattered. >>> Ironically this same "redundancy due to oversampling" that Chapman and >>> co. discuss in their introduction allows neighboring reflections to impart >>> most of the information of an isolated absent reflection. When the missing >>> reflections are clustered together in a thick shell or wedge, a lot of >>> information is not available and the structure will suffer. And in >>> particular the structural details that determine structure factors in the >>> center of the excluded zone will be poorly determined, since information >>> pertaining to them is being excluded. So of course the R-factor calculated >>> from these reflections will be higher than with randomly absent data. >>> Furthermore, if G-function is the vehicle by which R-free follows R, R-free >>> will follow less closely and hence under-report what improvement is being >>> made. >>> >>> >>> >>> >>> >>> >>> > >>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[email protected] >>> <mailto:[email protected]>> wrote: >>> > >>> > Revisiting (and testing) an old question: >>> > >>> > On 08/12/2003 02:38 PM, [email protected] <mailto: >>> [email protected]> wrote: >>> > > *** For details on how to be removed from this list visit the >>> *** >>> > > *** CCP4 home page http://www.ccp4.ac.uk < >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=> >>> *** >>> > >>> > > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote: >>> > >> >>> > >> (1) you only need to take special care for choosing a test set >>> if you _apply_ >>> > >> the NCS in your refinement, either as restraints or as >>> constraints. If you >>> > >> refine your NCS protomers without any NCS >>> restraints/constraints, both your >>> > >> protomers and your reflections will be independent, and thus >>> no special care >>> > >> for choosing a test set has to be taken >>> > > >>> > > If your space group is P6 with only one molecule in the >>> asymmetric unit but you instead choose the subgroup P3 in which to refine >>> it, and you now have two molecules per asymmetric unit related by "local" >>> symmetry to one another, but you don't apply it, does that mean that >>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3 >>> unless you apply the "NCS"? >>> > >>> > =================================================== >>> > The experiment described below seems to show that Dirk's initial >>> > statement was correct: even in the case where the "ncs" is actually >>> > crystallographic, and the free set is chosen randomly, R-free is not >>> > affected by how you pick the free set. A structure is refined with >>> > artificially low symmetry, so that a 2-fold crystallographic >>> operator >>> > becomes "NCS". Free reflections are picked either randomly (in which >>> > case the great majority of free reflections are related by the NCS >>> to >>> > working reflections), or taking the lattice symmetry into account so >>> > that symm-related pairs are either both free or both working. The >>> final >>> > R-factors are not significantly different, even with repeating each >>> mode >>> > 10 times with independently selected free sets. They are also not >>> > significantly different from the values obtained refining in the >>> correct >>> > space group, where there is no ncs. >>> > >>> > Maybe this is not really surprising. Since symmetry-related >>> reflections >>> > have the same resolution, picking free reflections this way is one >>> way >>> > of picking them in (very) thin shells, and this has been reported >>> not to >>> > avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol >>> 4, >>> > 897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). >>> And see: >>> > >>> > http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html >>> < >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e= >>> > >>> > >>> > But this is more significant: in cases of lattice symmetry like >>> this, >>> > the ncs takes working reflections directly onto free reflections. >>> In the >>> > case of true ncs the operator takes the reflection to a point >>> between >>> > neighboring reflections, which are closely coupled to that point by >>> the >>> > Rossmann G function. Some of these neighbors are outside the thin >>> shell >>> > (if the original reflection was inside; or vice versa), and thus >>> defeat >>> > the thin-shells strategy. In our case the symm-related free >>> reflection >>> > is directly coupled to the working reflection by the ncs operator, >>> and >>> > its neighbors are no closer than the neighbors of the original >>> > reflection, so if there is bias due to NCS it should be principally >>> > through the sym-related reflection and not through its neighbors. >>> And so >>> > most of the bias should be eliminated by picking the free set in >>> thin >>> > shells or by lattice symmetry. >>> > >>> > Also, since the "ncs" is really crystallographic, we have the >>> control of >>> > refining in the correct space group where there is no ncs. The >>> R-factors >>> > were not significantly different when the structure was refined in >>> the >>> > correct space group. (Although it could be argued that that leads >>> to a >>> > better structure, and the only reason the R-factors were the same is >>> > that bias in the lower symmetry refinement resulted in lowering >>> Rfree >>> > to the same level.) >>> > >>> > Just one example, but it is the first I tried- no cherry-picking. I >>> > would be interested to know if anyone has an example where taking >>> > lattice symmetry into account did make a difference. >>> > >>> > For me the lack of effect is most simply explained by saying that, >>> while >>> > of course ncs-related reflections are correlated in their Fo's and >>> Fc's, >>> > and perhaps in in their |Fo-Fc|'s, I see no reason to expect that >>> the >>> > _changes_ in |Fo-Fc| produced by a step of refinement will be >>> correlated >>> > (I can expound on this). Therefore whatever refinement is doing to >>> > improve the fit to working reflections is equally likely to improve >>> or >>> > worsen the fit to sym-related free reflections. In that case it is >>> hard >>> > to see how refinement against working reflections could bias their >>> > symm-related free reflections. (Then how does R-free work? Why does >>> > R-free come down at all when you refine? Because of coupling to >>> > neighboring working reflections by the G-function?) >>> > >>> > Summary of results (details below): >>> > 0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff) >>> > R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087 >>> > >>> > 1. The deposited 2chr (I422) was refined in that space group with >>> the >>> > original free set. No Sigma cutoff, 10 macrocycles. >>> > R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087 >>> > >>> > 2. The deposited structure was refined in I422 10 times, 50 >>> macrocycles >>> > each, with randomly picked 10% free reflections >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl: >>> 9087 >>> > >>> > 3. The structure was expanded to an I4 dimer related by the unused >>> I422 >>> > crystallographic operator, matching the dimer of 1chr. This dimer >>> was >>> > refined against the original (I4) data of 1chr, picking free >>> reflections >>> > in symmetry related pairs. This was repeated 10 times with different >>> > random seed for picking reflections. >>> > R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011 >>> > >>> > 4. same as 3 but picking free reflections randomly without regard >>> for >>> > lattice symmetry. >>> > On average 15 free reflections were in pairs, 212 were invariant >>> under >>> > the operator (no sym-mate) and 1374 (86%) were paired with working >>> > reflections. >>> > R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011 >>> > >>> > (**-Average Rfree almost identical by coincidence- the individual >>> > results were all different) >>> > >>> > Detailed results from the individual refinement runs are available >>> in >>> > spreadsheet in dropbox: >>> > https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 < >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e= >>> > >>> > Scripts used in running the tests are also there in NCSbias.tgz: >>> > https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 < >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e= >>> > >>> > >>> > ======================================== >>> > >>> > Methods: >>> > I would like an experiment where relatively complete data is >>> available >>> > in the lower symmetry. To get something that is available to >>> everyone, I >>> > choose from the PDB. A good example is 2CHR, in space group I422, >>> which >>> > was originally solved and the data deposited in I4 with two >>> molecules in >>> > the asymmetric unit(structure 1CHR). >>> > >>> > 2CHR statistics from the PDB: >>> > R R-free complete (Refined 8.0 to 3.0 A >>> > 0.189 0.264 81.4 reported in PDB, with 2-Sig >>> cutoff) >>> > Nfree=442 (4.86%) >>> > Further refinement in phenix with same free set, no sigma cutoff: >>> > 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix >>> default >>> > Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 >>> Nfree=442(4.86%) >>> > Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = >>> 1.428 >>> > Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = >>> 1.284 >>> > (2chr_orig_001.pdb, >>> > >>> > The number of free reflections is small, so the uncertainty >>> > in Rfree is large (a good case for Rcomplete) >>> > Instead for better statistics, use new 10% free set and repeat 10 >>> times; >>> > 50 macrocycles, with different random seeds: >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192 >>> > Nfree: 908.9±0.32 Nrefl: 9087 >>> > >>> > For artificially low symmetry, expand the I422 structure (making >>> what I >>> > call 3chr for convenience although I'm sure that ID has been taken): >>> > >>> > pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof >>> > exclude header >>> > spacegroup I4 >>> > cell 111.890 111.890 148.490 90.00 90.00 90.00 >>> > symgen X,Y,Z >>> > symgen X,1-Y,1-Z >>> > CHAIN SYMMETRY 2 A B >>> > eof >>> > >>> > Get the structure factors from 1CHR: 1chr-sf.cif >>> > Run phenix.refine on 3chr.pdb with 1chr-sf.cif. >>> > This file has no free set (deposited 1993) so tell phenix to >>> generate >>> > one. I don't want phenix to protect me from my own stupidity, so I >>> use: >>> > generate = True >>> > use_lattice_symmetry = False >>> > use_dataman_shells = False >>> > (the .eff file with all non-default parameters is available as >>> > 3chr_rand_001.eff in the .tgz mentioned above) >>> > >>> > For more significance, use the script multirefine.csh to repeat the >>> refinement 10 times with different random seed.After each run, grep >>> significant results into a log file. >>> > >>> > >>> > To check this gives free reflections related to working >>> reflections, I >>> > used mtz2various and a fortran prog (sortfree.f in .tgz) to >>> separate the >>> > data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k >>> > (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly: >>> > >>> > mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof >>> > LABIN FP=F-obs DUM1=R-free-flags >>> > OUTPUT USER '(3I4,2F10.5)' >>> > eof >>> > sortfree <<eof >sort3.hkl >>> > >>> > sort3.hkl looks like: >>> > ______h>k______ ______h<k______ >>> > h k l F free F* free* >>> > 1 2 3 208.97 0.00 174.95 0.00 >>> > 1 2 5 226.85 0.00 191.65 0.00 >>> > 1 2 7 144.85 0.00 164.86 0.00 >>> > 1 2 9 251.26 0.00 261.71 0.00 >>> > 1 2 11 333.84 0.00 335.18 0.00 >>> > 1 2 13 800.37 0.00 791.77 0.00 >>> > 1 2 15 412.92 0.00 409.90 0.00 >>> > 1 2 17 306.99 0.00 317.53 0.00 >>> > 1 2 19 225.54 0.00 220.91 0.00 >>> > 1 2 21 101.20 1.00* 104.84 0.00 >>> > 1 2 23 156.27 0.00 156.49 0.00 >>> > 1 2 25 202.97 0.00 202.23 0.00 >>> > 1 2 27 216.10 0.00 219.28 0.00 >>> > 1 2 29 106.76 0.00 100.93 0.00 >>> > 1 2 31 157.32 0.00 154.37 1.00* >>> > 1 2 33 71.84 0.00 20.78 0.00 >>> > 1 2 35 179.05 0.00 165.67 0.00 >>> > 1 2 37 254.04 0.00 239.96 1.00* >>> > 1 2 39 69.56 0.00 30.61 0.00 >>> > 1 2 41 56.20 0.00 51.02 0.00 >>> > >>> > , and awked for 1 in the free columns. Out of 6922 pairs of >>> reflections, >>> > in one case: >>> > 674 in the first asu (h>k) are in the free set, >>> > 703 in the second asu (h<k) are in the free set >>> > only 11 pairs have the reflections in both asu free. >>> > >>> > out of 16011 refl in I4, >>> > 6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with >>> absent mate. >>> > >>> > out of 1601 free reflections: >>> > On average 15 free reflections were in pairs, 212 were invariant >>> under >>> > the operator (no sym-mate) and 1374 (86%) were paired with working >>> > reflections. >>> > >>> > Then do 10 more runs of 50 macrocycles with: >>> > use_lattice_symmetry = False >>> > collecting the same statistics >>> > (also scripted in multirefine.csh) >>> > >>> > Finally, use ref2chr.eff to refine (as previously mentined) a >>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles >>> > (also scripted in multirefine.csh) >>> > >>> > >>> ######################################################################## >>> > >>> > To unsubscribe from the CCP4BB list, click the following link: >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 < >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e= >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>> > >>> > To unsubscribe from the CCP4BB list, click the following link: >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 < >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=> >>> >>> >>> > >>> >>> ######################################################################## >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 >>> >>> ------------------------------ >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 >>> >> >> ------------------------------ >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 >> >> >> ------ >> Randy J. Read >> Department of Haematology, University of Cambridge >> Cambridge Institute for Medical Research Tel: + 44 1223 336500 >> The Keith Peters Building Fax: + 44 1223 >> 336827 >> Hills Road E-mail: >> [email protected] <[email protected]> >> Cambridge CB2 0XY, U.K. >> www-structmed.cimr.cam.ac.uk >> >> > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > > > ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
