Well, of all the possible metrics you could use to asses data quality Rfree is probably the worst one.  This is because it is a cross-validation metric, and cross-validations don't work if you use them as an optimization target. You can try, and might even make a little headway, but then your free set is burnt. If you have a third set of observations, as suggested for Rsleep (doi:10.1107/S0907444907033458), then you have a chance at another round of cross-validation. Crystallographers don't usually do this, but it has become standard practice in machine learning (training=Rwork, validation=Rfree and testing=Rsleep).

So, unless you have an Rsleep set, any time you contemplate doing a bunch of random things and picking the best Rfree ... don't.  Just don't.  There madness lies.

What happens after doing this is you will be initially happy about your lower Rfree, but everything you do after that will make it go up more than it would have had you not performed your Rfree optimization. This is because the changes in the data that made Rfree randomly better was actually noise, and as the structure becomes more correct it will move away from that noise. It's always better to optimize on something else, and then check your Rfree as infrequently as possible. Remember it is the control for your experiment. Never mix your positive control with your sample.

As for the best metric to assess data quality?  Well, what are you doing with the data? There are always compromises in data processing and reduction that favor one application over another.  If this is a "I just want the structure" project, then score on the resolution where CC1/2 hits your favorite value. For some that is 0.5, others 0.3. I tend to use 0.0 so I can cut it later without re-processing. Whatever you do just make it consistent.

If its for anomalous, score on CCanom or if that's too noisy the Imean/sigma in the lowest-angle resolution or highest-intensity bin. This is because for anomalous you want to minimize relative error. The end-all-be-all of anomalous signal strength is the phased anomalous difference Fourier. You need phases to do one, but if you have a structure just omit an anomalous scatterer of interest, refine to convergence, and then measure the peak height at the position of the omitted anomalous atom.  Instructions for doing anomalous refinement in refmac5 are here:
https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html

If you're looking for a ligand you probably want isomorphism, and in that case refining with a reference structure looking for low Rwork is not a bad strategy. This will tend to select for crystals containing a molecule that looks like the one you are refining.  But be careful! If it is an apo structure your ligand-bound crystals will have higher Rwork due to the very difference density you are looking for.

But if its the same data just being processed in different ways, first make a choice about what you are interested in, and then optimize on that.  just don't optimize on Rfree!

-James Holton
MAD Scientist


On 10/27/2021 8:44 AM, Murpholino Peligro wrote:
Let's say I ran autoproc with different combinations of options for a specific dataset, producing dozens of different (but not so different) mtz files... Then I ran phenix.refine with the same options for the same structure but with all my mtz zoo
What would be the best metric to say "hey this combo works the best!"?
R-free?
Thanks

M. Peligro

------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to