Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

James Holton Fri, 29 Oct 2021 07:50:29 -0700

Well, of all the possible metrics you could use to asses data qualityRfree is probably the worst one. This is because it is across-validation metric, and cross-validations don't work if you usethem as an optimization target. You can try, and might even make alittle headway, but then your free set is burnt. If you have a third setof observations, as suggested for Rsleep(doi:10.1107/S0907444907033458), then you have a chance at another roundof cross-validation. Crystallographers don't usually do this, but it hasbecome standard practice in machine learning (training=Rwork,validation=Rfree and testing=Rsleep).

So, unless you have an Rsleep set, any time you contemplate doing abunch of random things and picking the best Rfree ... don't. Justdon't. There madness lies.

What happens after doing this is you will be initially happy about yourlower Rfree, but everything you do after that will make it go up morethan it would have had you not performed your Rfree optimization. Thisis because the changes in the data that made Rfree randomly better wasactually noise, and as the structure becomes more correct it will moveaway from that noise. It's always better to optimize on something else,and then check your Rfree as infrequently as possible. Remember it isthe control for your experiment. Never mix your positive control withyour sample.

As for the best metric to assess data quality? Well, what are you doingwith the data? There are always compromises in data processing andreduction that favor one application over another. If this is a "I justwant the structure" project, then score on the resolution where CC1/2hits your favorite value. For some that is 0.5, others 0.3. I tend touse 0.0 so I can cut it later without re-processing. Whatever you dojust make it consistent.

If its for anomalous, score on CCanom or if that's too noisy theImean/sigma in the lowest-angle resolution or highest-intensity bin.This is because for anomalous you want to minimize relative error. Theend-all-be-all of anomalous signal strength is the phased anomalousdifference Fourier. You need phases to do one, but if you have astructure just omit an anomalous scatterer of interest, refine toconvergence, and then measure the peak height at the position of theomitted anomalous atom. Instructions for doing anomalous refinement inrefmac5 are here:

https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html

If you're looking for a ligand you probably want isomorphism, and inthat case refining with a reference structure looking for low Rwork isnot a bad strategy. This will tend to select for crystals containing amolecule that looks like the one you are refining. But be careful! Ifit is an apo structure your ligand-bound crystals will have higher Rworkdue to the very difference density you are looking for.

But if its the same data just being processed in different ways, firstmake a choice about what you are interested in, and then optimize onthat. just don't optimize on Rfree!


-James Holton
MAD Scientist


On 10/27/2021 8:44 AM, Murpholino Peligro wrote:

Let's say I ran autoproc with different combinations of options for aspecific dataset, producing dozens of different (but not so different)mtz files...Then I ran phenix.refine with the same options for the same structurebut with all my mtz zoo
What would be the best metric to say "hey this combo works the best!"?
R-free?
Thanks

M. Peligro

------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

Reply via email to