Re: [ccp4bb] Fo simulators - summary

Alastair Fyfe Mon, 09 Sep 2013 12:08:46 -0700

thanks for the reference to the script and additional discussion. I'velooked through the archives a bit but couldn't find an answer to aquestion that's been on my mind for a while so my apologies if thisrevisits well-trod ground. One of the potential sources of disagreementcontributing to the gap could be poor modeling of scattering by theregion between "ordered" and "bulk" solvent . That is, the abrupttransition from point scatterers to bulk may not adequately modelregions with a greater incidence of transiently occupied scatteringsites. Are there any pointers to cites/software that investigatesmodeling a layer of semi-structured solvent, for example as a functionof distance from a molecular surface "colored" by its hydrogen-bondingpotential?

Looking at the magnitude of residual density as a function of distancefrom the molecular surface (Afonine, Urzhumtsev, Adams '12, Matthews'09) seems to point to a possible misfit in that region and somecalculations I've been doing using real space correlation give similarresults. Deletion of waters with poor model metrics (correlation, numberof neighbors, etc.) can improve Rwork while increasing Rfree suggestingthat the extra scattering is contributing meaningfully, even if poorlymodeled. Softening the ordered/bulk boundary with a differentiabletransition (Fenn,Schnieders,Brunger'10) doesn't address this thoughtheir concluding discussion seems to suggest it's worth investigating.The question has been examined in the SAXS literature (Virtanen,Makowski, Sosnick, Freed '11) but I haven't found found equivalentexperiments among refinement software.


On 09/07/2013 04:54 AM, James Holton wrote:

I feel like I should point out that there is about a 20% differencebetween "Fcalc" and something I would call a "simulated Fobs". Fcalcis something that refinement programs compute many times every secondas they apply 100 years worth of brilliant ideas to make your model(Fcalc) match your data (Fobs) as best we know how. Despite all this,one of the great mysteries of macromolecular structure determinationis just how awful the "final" match is: R/Rfree in the 20%s or highteens at best. Small molecule structures don't have this problem. Infact, they only recently started depositing "Fobs" in to the CSDbecause for the most small molecule structures "Fcalc" is moreaccurate than "Fobs" anyway.
This has been hashed over on this BB a number of times, so I refer theinterested reader to the archives. But there are two majorconsiderations in turning a "pdb file" into a "simulated Fobs":
1) the solvent
SFALL (part of the CCP4 suite) is a convenient tool for turningcoordinates into maps, or structure factors, but it doesn't "do" bulksolvent unless you trick it. I wrote a jiffy for doing this here:
http://bl831.als.lbl.gov/~jamesh/mlfsom/ano_sfall.com
download the script, make it executable, and run it with no argumentsto see instructions for how to use it. What is fascinating about thisvery crude bulk solvent implementation I did is that refinementprograms with much more sophisticated bulk solvent implementationshave a heck of a time trying to "match" it. If you want exactly thebulk solvent you would get from phenix, use phenix.fmodel, but thiswill not be exactly the same as the bulk solvent you get from REFMAC.Which one is right? Probably none of them.
2) The R-factor Gap
One can try to simulate the R-factor gap (between Rmeas and Rfree)by adding random numbers to "Fcalc" so that it becomes 20% differentfrom Fobs, but this is hardly a physically reasonable source oferror. If you do this enough times for the same PDB file and then"average over different crystals" you'll still end up with a datasetthat will refine to R/Rfree ~ 0/0.
This is the fundamental problem with making "simulated Fobs": weactually have no good way of "modelling" whatever is causing thisR-factor Gap, and therefore no good way of simulating it. If we couldsimulate it, then some refinement program would quickly implement away to model the effect, and give you R/Rfree of 0% again. There areabout as many ideas for the cause of the R-factor Gap as there arecrystallographers out there, but to this day nobody has come up with a"systematic error" that, when accounted for in refinement, gives you asmall-molecule-style R/Rfree for pretty much anything in the PDB. Noteven lysozyme.
-James Holton
MAD Scientist


On 9/5/2013 9:35 AM, Alastair Fyfe wrote:
Below are some links to tools for simulating Fobs data:
phenix.fake_f_obs:http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fake_f_obs.pyphenix.fmodel:http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fmodel.py
sftools (calc keyword):  http://www.ccp4.ac.uk/html/sftools.html

diffraction image simulators from James Holton
mlfsom: http://bl831.als.lbl.gov/~jamesh/mlfsom/
nearBragg: http://bl831.als.lbl.gov/~jamesh/nearBragg/
fastBragg: http://bl831.als.lbl.gov/~jamesh/fastBragg/

many thanks for the replies.
Alastair

Re: [ccp4bb] Fo simulators - summary

Reply via email to