Re: [ccp4bb] Archiving Images for PDB Depositions

James Holton Tue, 01 Nov 2011 10:07:35 -0700

On general scientific principles the reasons for archiving "raw data"all boil down to one thing: there was a systematic error, and you hopeto one day account for it. After all, a "systematic error" is justsomething you haven't modeled yet. Is it worth modelling? That depends...


There are two main kinds of systematic error in MX:
1) Fobs vs Fcalc

Given that the reproducibility of Fobs is typically < 3%, buttypical R/Rfree values are in the 20%s, it is safe to say that this is arather whopping systematic error. What causes it? Dunno. Wouldstructural biologists benefit from being able to model it? Oh yes!Imagine being able to reliably see a ligand that has an occupancy ofonly 0.05, or to be able to unambiguously distinguish between twoproposed reaction mechanisms and back up your claims with hard-corestatistics (derived from SIGF). Perhaps even teasing apart all thedifferent minor conformers occupied by the molecule in its functionalcycle? I think this is the main reason why we all decided to archiveFobs: 20% error is a lot.


2) scale factors

We throw a lot of things into "scale factors", including sampleabsorption, shutter timing errors, radiation damage, flicker in theincident beam, vibrating crystals, phosphor thickness, point-spreadvaiations, and many other phenomena. Do we understand the physicsbehind them? Yes (mostly). Is there "new biology" to be had bymodelling them more accurately? No. Unless, of course, you count allthe structures we have not solved yet.

Wouldn't it be nice if phasing from sulfur, phosphorous, chloride andother "native" elements actually worked? You wouldn't have to growSeMet protein anymore, and you could go after systems that don't expresswell in E. coli. Perhaps even going to the native source! I thinkthere is plenty of "new biology" to be had there. Wouldn't it be niceif you could do S-SAD even though your spots were all smeary andoverlapped and mosaic and radiation damaged?

Why don't we do this now? Simple!: it doesn't work. Why doesn't itwork? Because we don't know all the "scale factors" accurately enough.In most cases, the "% error" from all the scale factors usually adds upto ~3% (aka Rmerge, Rpim etc.), but the change in spot intensities dueto native element anomalous scattering is usually less than 1%.Currently, the world record for smallest Bijvoet ratio is ~0.5% (Wang etal. 2006), but if photon-counting were the only source of error, weshould be able to get Rmerge of ~0.1% or less, particularly in thelow-angle resolution bins. If we can do that, then there will be littleneed for SeMet anymore.

But, we need the "raw" images if we are to have any hope of figuring outhow to get the errors down to the 0.1% level. There is no one magicdataset that will tell us how to do this, we need to "average over" lotsof them. Yes, this is further "upstream" of the "new biology" thandeposited Fs, and yes the cost of archiving images is higher, but Ithink the potential benefits to the structural biology community if wecan crack the 0.1% S-SAD barrier is nothing short of revolutionary.


-James Holton
MAD Scientist

On 11/1/2011 8:32 AM, Anastassis Perrakis wrote:

Dear Gerard

Isolating your main points:
but there would have been no PDB-REDO because the
data for running it would simply not have been available! ;-) . Or doyou
think the parallel does not apply?
...
have thought, some value. From the perspective of your message, then,why
are the benefits of PDB-REDO so unique that PDB-REPROCESS would have no
chance of measuring up to them?
I was thinking of the inconsistency while sending my previous email... ;-)
Basically, the parallel does apply. PDB-REPROCESS in a few years would
be really fantastic - speaking as a crystallographer and methodsdeveloper.
Speaking as a structural biologist though, I did think long and hardabout
the usefulness of PDB_REDO. I obviously decided its useful since I am now
heavily involved in it for a few reasons, like uniformity of finalmodel treatment,improving refinement software, better statistics on structure qualitymetrics,
and of course seeing if the new models will change our understanding of
the biology of the system.
An experiment that I would like to do as a structural biologist - isthe following:What about adding an "increasing noise" model to the Fobs's of a fewdatasets and re-refining?How much would that noise change the final model quality metrics andin absolute terms?
(for the changes that PDB_RE(BUILD) does have a preview athttp://www.ncbi.nlm.nih.gov/pubmed/22034521....I tried to avoid the shamelessly self-promoting plug-in, but couldresists at the end!)
That experiment - or a better designed variant for it ? - would maybetell us if we should be advocating the archive of all images,and being scientifically convinced of the importance of that beyondmethods development, we would all argue a strong case
to the funding and hosting agencies.

Tassos
PS Of course, that does not negate the all-important argument, thatwhen struggling with marginaldata better processing software is essential. There is a clear needfor better softwareto process images, especially for low resolution and low signal/noisecases.Since that is dependent on having test data I am all for supporting aninitiative to collect such data,
and I would gladly spend a day digging our archives to contribute.

Re: [ccp4bb] Archiving Images for PDB Depositions

Reply via email to