Re: [ccp4bb] refining against weak data and Table I stats

Edward A. Berry Thu, 13 Dec 2012 14:54:14 -0800

Good question.
In the structure mentioned earlier, cutting the resolution from 1.6 to 2 A
didn't make a significant difference:

"The model did not change significantly with extensive refinement at the lower resolution, with the all-atom RMSDbetween the structures being 0.065 Å and the maximum deviation 0.59 Å for a water molecule."


However going the other way, if we originally refined at 2 A, I don't know
if we would have converged on the same structure.

But that is addressing the question of using weak data-
If you mean will having a higher resolution crystal provide biological insight- 
the atoms
are pretty well located already at 1.6 A (except for disordered bits). They
will be more precisely located at 1.45 A, but that probably won't change the
conclusions about what is H-bonding what or whether that serine could be
serving as a catalytic base. I would prefer to have the higher resolution,
but i wouldn't apply for an NIH grant to grow better crystals of a structure
that is already available at 1.6 A

As to your question about adding waters to reduce the R-factor-
I assume you are referring to the practice of adding a water at
every peak in a difference map, whether due to water or Fourier
truncation artifacts or partially ordered bits of detergent and lipids,
in order to match Fo to Fc and reduce the R-factor-
No that is different because it can actually make the model worse,
and used to be severely criticized- you don't hear much about this
recently though, perhaps because of reliance on R-free and because
that practice may not reduce R-free?
Water picking-programs impose distance restraints on picked waters,
and you are encouraged to go through and examine each water for
reasonableness before accepting it.
eab
-------------------
Dry humor in science-
PNAS September 11, 2012 vol. 109 no. 37 14754-14760:

Under a scenario of increasing population size and extreme aridity (with little or no decomposition of corpses) a simpledemographic model shows that dead individuals may have become a significant part of the landscape.




Theresa Hsu wrote:

Being a beginner crystallographer, may I ask a basic question? On how many 
occasions does it make a *biological* difference between having a structure at 
1.42 and 1.6 A? I think this question also extends to adding in water molecules 
just to make statistics look good.

Thank you.

Theresa


On Thu, 13 Dec 2012 10:07:56 -0500, Douglas Theobald<[email protected]>  
wrote:

On Dec 13, 2012, at 1:52 AM, James Holton<[email protected]>  wrote:

[snip]

So, what I would advise is to refine your model with data out to the resolution limit 
defined by CC*, but declare the "resolution of the structure" to be where the 
merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree 
and all the other R values to this resolution as well, since including a lot of zeroes 
does nothing but artificially drive up estimates of relative error.


So James --- it appears that you basically agree with my proposal?  I.e.,

(1) include all of the data in refinement (at least up to where CC1/2 or CC* is still 
"significant")

(2) keep the definition of resolution to what is more-or-less the defacto 
standard (res bin where I/sigI=2),

(3) report Table I where everything is calculated up to this resolution (where 
I/sigI=2), and

(4) maybe include in Supp Mat an additional table that reports statistics for 
all the data (I'm leaning towards a table with stats for each res bin)

As you argued, and as I argued, this seems to be a good compromise, one that 
modifies current practice to include weak data, but nevertheless does not 
change the def of resolution or the Table I stats, so that we can still compare 
with legacy structures/stats.

Perhaps we should even take a lesson from our "small molecule" friends and start 
reporting "R1", where the R factor is computed only for hkls where I/sigma(I) is above 3?

-James Holton
MAD Scientist

On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote:

I too like the idea of reporting the table 1 stats vs resolution
rather than just the overall values and highest resolution shell.

I also wanted to point out an earlier thread from April about the
limitations of the PDB's defining the resolution as being that of
the highest resolution reflection (even if data is incomplete or weak).
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=376289
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=377673

What we have done in the past for cases of low completeness
in the outer shell is to define the nominal resolution ala Bart
Hazes' method of same number of reflections as a complete data set and
use this in the PDB title and describe it in the remark 3 other
refinement remarks.
   There is also the possibility of adding a comment to the PDB
remark 2 which we have not used.
http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202
This should help convince reviewers that you are not trying
to mis-represent the resolution of the structure.


Regards,
Mitch

-----Original Message-----
From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of Edward A. 
Berry
Sent: Friday, December 07, 2012 8:43 AM
To: [email protected]
Subject: Re: [ccp4bb] refining against weak data and Table I stats

Yes, well, actually i'm only a middle author on that paper for a good
reason, but I did encourage Rebecca and Stephan to use all the data.
But on a later, much more modest submission, where the outer shell
was not only weak but very incomplete (edges of the detector),
the reviewers found it difficult to evaluate the quality
of the data (we had also excluded a zone with bad ice-ring
problems). So we provided a second table, cutting off above
the ice ring in the good strong data, which convinced them
that at least it is a decent 2A structure. In the PDB it is
a 1.6A structure. but there was a lot of good data between
the ice ring and 1.6 A.

Bart Hazes (I think) suggested a statistic called "effective
resolution" which is the resolution to which a complete dataset
would have the number of reflectionin your dataset, and we
reported this, which came out to something like 1.75.

I do like the idea of reporting in multiple shells, not just overall
and highest shell, and the PDB accomodatesthis, even has a GUI
to enter it in the ADIT 2.0 software. It could also be used to
report two different overall ranges, such as completeness, 25 to 1.6 A,
which would be shocking in my case, and 25 to 2.0 which would
be more reassuring.

eab

Douglas Theobald wrote:

Hi Ed,

Thanks for the comments.  So what do you recommend?  Refine against weak data, 
and report all stats in a single Table I?

Looking at your latest V-ATPase structure paper, it appears you favor something 
like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65.


On Dec 6, 2012, at 7:24 PM, Edward A. Berry<[email protected]>   wrote:

Another consideration here is your PDB deposition. If the reason for using
weak data is to get a better structure, presumably you are going to deposit
the structure using all the data. Then the statistics in the PDB file must
reflect the high resolution refinement.

There are I think three places in the PDB file where the resolution is stated,
but i believe they are all required to be the same and to be equal to the
highest resolution data used (even if there were only two reflections in that 
shell).
Rmerge or Rsymm must be reported, and until recently I think they were not 
allowed
to exceed 1.00 (100% error?).

What are your reviewers going to think if the title of your paper is
"structure of protein A at 2.1 A resolution" but they check the PDB file
and the resolution was really 1.9 A?  And Rsymm in the PDB is 0.99 but
in your table 1* says 1.3?

Douglas Theobald wrote:

Hello all,

I've followed with interest the discussions here about how we should be refining against weak 
data, e.g. data with I/sigI<<    2 (perhaps using all bins that have a 
"significant" CC1/2 per Karplus and Diederichs 2012).  This all makes statistical 
sense to me, but now I am wondering how I should report data and model stats in Table I.

Here's what I've come up with: report two Table I's.  For comparability to legacy structure stats, 
report a "classic" Table I, where I call the resolution whatever bin I/sigI=2.  Use that 
as my "high res" bin, with high res bin stats reported in parentheses after global stats. 
  Then have another Table (maybe Table I* in supplementary material?) where I report stats for the 
whole dataset, including the weak data I used in refinement.  In both tables report CC1/2 and Rmeas.

This way, I don't redefine the (mostly) conventional usage of "resolution", my 
Table I can be compared to precedent, I report stats for all the data and for the model 
against all data, and I take advantage of the information in the weak data during 
refinement.

Thoughts?

Douglas


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Assistant Professor
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

[email protected]
http://theobald.brandeis.edu/

              ^\
    /`  /^.  / /\
   / / /`/  / . /`
/ /  '   '
'

Re: [ccp4bb] refining against weak data and Table I stats

Reply via email to