Re: [ccp4bb] Help sought for problem dataset

Eleanor Dodson Tue, 25 Nov 2008 01:44:54 -0800

I guess my hunch would be that there is some sot of order-disorderproblem; either twinning or crystal dislocation.


Some ideas -

reindex the PG222 data set h/2,k,l so that a~=b and test that data fortwinning - you can just run truncate on the output Is and see themoments and cumulative intensity plots.


Try solving the structure with that data set to find 2 molecules.

The new SG is likely to be P2 212 21 (or possibly P 21 21 21. Look atthe new set of h' 00 to see if there are any absences)


It might  suggest something..

Eleanor



James Irving wrote:

Dear cpp4bb,

I'm wrestling with a crystal form that is causing us a great deal of
trouble, and I was hoping for some general suggestions that might get us
working in the right direction.  The protein in question is 40kDa.  Details
are provided below.  The summary is that at the model building/refinement
stage this data is behaving as though it is twinned, or merged in the wrong
space group, or there is some other fundamental anomaly that needs to be
accounted for, but the tests we've performed indicate that it is solved, not
twinned, in the correct space group.

Here are some details:

1.  It integrates and merges well in P2(1)2(1)2(1) without any trouble (see
table pasted at bottom).  We've collected two datasets, around 2.5-2.6A
each, one is of a selenomethionine derivative.  To get as strong data as
possible for MIRAS we collected ~400 deg of the derivative, and around 205
deg of the native.  Scaling and merging in XDS (although we've used
mosflm/scala as well) gives rather good stats at the low resolution (rmerge
1-2% <3.7A) and deteriorate quite markedly in the high resolution bin (~50%
at 2.55-2.7A), with I/sigI of 60 for the former and 3.0 for the latter.
Systematic absences very strongly support a P212121 space group.

2. There is no visible sign that this is part of a superlattice, or
comprises a superlattice.  Virtually all spots are accounted for during
integration.  By eye there are no systematically weak and strong
reflections.  Playing with the minimum I/sigI at the indexing stage doesn't
do anything, nor does deleting strong reflections and indexing only with
weak ones, nor does indexing using only high, or only low, resolution spots.

3. Unit cell: a=42.6 b= 85.3 c=108.5   90 90 90.   a is almost exactly 2*b.


4. Wilson plot looks normal.  There is no detected pseudotranslation.
Cumulative
intensity distribution in truncate appears *very slightly* sigmoidal.  Is it
twinned?  More on that in a moment...

5. There is a reasonably close homolog of this protein that has been
crystallised (~50% identity) - we were expecting an easy MR solution.
Phaser gives Z-score in the rotation function of ~17, and ~11 in the
translation function for a single molecule in the AU, as expected.  2FoFc &
FoFc maps look absolutely rubbish, very much worse than would be expected
for this protein at this resolution with this solution.  Correction for
anisotropy doesn't improve maps much here or at any other stage in the
building/refinement.

6. Scaling and merging in P21 on the off-chance of perfect twinning or
pseudosymmetry gives exactly two solutions with very good Z-scores.  Maps
still look rubbish.  Phenix.xtriage, as would be expected, suggests a
twinning operator with alpha ~0.5 that is identical to a crystallographic
operator in P212121.   Rigid body refinement using phenix.refine and this
twinning operator gives Rfactor/Rfree that are low but again maps are
uninterpretable.  Conclusion: this isn't perfectly pseudomeroherally twinned
in P21.

7. Went back to basics in P1.  Same deal as P21.

8. All other enantiomorphs in monoclinic and orthorhombic give significantly
lower, and poorly distinguished translation Z-scores in MR.

9. The selenomethionine dataset was solved using MIRAS in SHARP/autoSHARP.
The experimentally phased electron density yields contiguous tracts of
density in the right place,  unbiased density indicates a good solution.
Model building was conducted in P212121, initially into the experimental
maps and later with refinement in refmac using HL-coefficient-based
restraints.  In some regions, sequence can easily be deduced from clean
electron density (for the resolution).  In other regions, side chains are
missing, and in others, density is completely inconsistent with the
connectivity of the chains and highly conserved structural elements.  As
occurs sometimes with twinned data, many loops cannot be modelled at all,
and the Rfree does not drop below 0.41 with an Rfactor of 0.34.  The result
is a model that is about 60-70% complete.  Refinement was performed with and
without B-factor refinement.

10. Using density modification in SOLOMON, a structure-based solvent mask in
DM or statistical modification in PIRATE fails to elucidate these
additional, significant missing regions (which includes three helices, 1.5
beta sheets and several loops).  Tellingly, comparing final models to the
original experimentally phased maps shows truncation of the model at the
same places as "truncation" occurs in the electron density.  During the
building process as much care was taken as possible that the structure was
not being built into a "local minimum".

11. phenix.autobuild is able to build a polyalanine model that covers about
25% of the molecule.

12. The native and derivative datasets scale extremely well together: they
are strongly consistent.  This is often not the case with twinned crystals.

Any suggestions would be greatly appreciated!

Thanks,
James

OUTPUT FROM CORRECT IN XDS:

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR
COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed
expected                                      Corr

     7.02        4906     697       746       93.4%       1.9%      2.2%
4897   77.46     2.0%     1.1%    11%   0.849     447
     5.04        9074    1152      1152      100.0%       2.1%      2.5%
9074   66.65     2.3%     1.4%     1%   0.797     890
     4.13       11742    1445      1446       99.9%       2.3%      2.6%
11742   65.87     2.5%     1.5%    -7%   0.755    1178
     3.59       13929    1685      1685      100.0%       3.4%      3.6%
13929   45.70     3.6%     2.8%     2%   0.763    1428
     3.21       15675    1884      1884      100.0%       6.5%      6.5%
15675   28.42     6.9%     5.7%    -1%   0.811    1618
     2.94       17324    2066      2066      100.0%      14.1%     14.2%
17324   14.14    15.0%    14.5%     0%   0.787    1804
     2.72       18743    2232      2232      100.0%      27.5%     27.7%
18743    7.61    29.3%    29.9%     0%   0.749    1965
     2.55       20107    2388      2388      100.0%      46.1%     45.4%
20107    4.85    49.1%    51.7%    -2%   0.717    2123
     2.40       20056    2466      2545       96.9%      73.6%     72.2%
20023    3.04    78.5%    81.1%     1%   0.707    2144
    total      131556   16015     16144       99.2%       5.4%      5.6%
131514   26.34     5.7%    10.6%     0%   0.758   13597

Attached figures:
Data scaled in P2, self-rotation function in MOLREP
Data scaled in P222, self-rotation function in MOLREP
Cumulative intensity distribution in TRUNCATE

------------------------------------------------------------------------


------------------------------------------------------------------------


------------------------------------------------------------------------

Re: [ccp4bb] Help sought for problem dataset

Reply via email to