Re: [ccp4bb] Help sought for problem dataset

Borhani, David Tue, 25 Nov 2008 06:19:57 -0800

James, when you tried P21, did you try all possibilities, i.e. P 1 21 1,
P 21 1 1, and P 1 1 21? Also, try the P2 versions thereof, and all eight
orthorhombic possibilities. I've had several crystals where the syst.
absences were misleading; pseudo-crystallographic NCS could account for
them. Dave
David Borhani, Ph.D. 
D. E. Shaw Research, LLC 
120 West Forty-Fifth Street, 39th Floor 
New York, NY 10036 
[EMAIL PROTECTED] 
212-478-0698 
http://www.deshawresearch.com <http://www.deshawresearch.com/>



________________________________

        From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On
Behalf Of James Irving
        Sent: Tuesday, November 25, 2008 1:23 AM
        To: [email protected]
        Subject: [ccp4bb] Help sought for problem dataset
        
        
        Dear cpp4bb,
        
        I'm wrestling with a crystal form that is causing us a great
deal of trouble, and I was hoping for some general suggestions that
might get us working in the right direction.  The protein in question is
40kDa.  Details are provided below.  The summary is that at the model
building/refinement stage this data is behaving as though it is twinned,
or merged in the wrong space group, or there is some other fundamental
anomaly that needs to be accounted for, but the tests we've performed
indicate that it is solved, not twinned, in the correct space group.
        
        Here are some details:
        
        1.  It integrates and merges well in P2(1)2(1)2(1) without any
trouble (see table pasted at bottom).  We've collected two datasets,
around 2.5-2.6A each, one is of a selenomethionine derivative.  To get
as strong data as possible for MIRAS we collected ~400 deg of the
derivative, and around 205 deg of the native.  Scaling and merging in
XDS (although we've used mosflm/scala as well) gives rather good stats
at the low resolution (rmerge 1-2% <3.7A) and deteriorate quite markedly
in the high resolution bin (~50% at 2.55-2.7A), with I/sigI of 60 for
the former and 3.0 for the latter.  Systematic absences very strongly
support a P212121 space group.
        
        2. There is no visible sign that this is part of a superlattice,
or comprises a superlattice.  Virtually all spots are accounted for
during integration.  By eye there are no systematically weak and strong
reflections.  Playing with the minimum I/sigI at the indexing stage
doesn't do anything, nor does deleting strong reflections and indexing
only with weak ones, nor does indexing using only high, or only low,
resolution spots.
        
        3. Unit cell: a=42.6 b= 85.3 c=108.5   90 90 90.   a is almost
exactly 2*b.  
        
        4. Wilson plot looks normal.  There is no detected
pseudotranslation.  Cumulative intensity distribution in truncate
appears *very slightly* sigmoidal.  Is it twinned?  More on that in a
moment...
        
        5. There is a reasonably close homolog of this protein that has
been crystallised (~50% identity) - we were expecting an easy MR
solution.  Phaser gives Z-score in the rotation function of ~17, and ~11
in the translation function for a single molecule in the AU, as
expected.  2FoFc & FoFc maps look absolutely rubbish, very much worse
than would be expected for this protein at this resolution with this
solution.  Correction for anisotropy doesn't improve maps much here or
at any other stage in the building/refinement.
        
        6. Scaling and merging in P21 on the off-chance of perfect
twinning or pseudosymmetry gives exactly two solutions with very good
Z-scores.  Maps still look rubbish.  Phenix.xtriage, as would be
expected, suggests a twinning operator with alpha ~0.5 that is identical
to a crystallographic operator in P212121.   Rigid body refinement using
phenix.refine and this twinning operator gives Rfactor/Rfree that are
low but again maps are uninterpretable.  Conclusion: this isn't
perfectly pseudomeroherally twinned in P21.
        
        7. Went back to basics in P1.  Same deal as P21.
        
        8. All other enantiomorphs in monoclinic and orthorhombic give
significantly lower, and poorly distinguished translation Z-scores in
MR.
        
        9. The selenomethionine dataset was solved using MIRAS in
SHARP/autoSHARP.  The experimentally phased electron density yields
contiguous tracts of density in the right place,  unbiased density
indicates a good solution.  Model building was conducted in P212121,
initially into the experimental maps and later with refinement in refmac
using HL-coefficient-based restraints.  In some regions, sequence can
easily be deduced from clean electron density (for the resolution).  In
other regions, side chains are missing, and in others, density is
completely inconsistent with the connectivity of the chains and highly
conserved structural elements.  As occurs sometimes with twinned data,
many loops cannot be modelled at all, and the Rfree does not drop below
0.41 with an Rfactor of 0.34.  The result is a model that is about
60-70% complete.  Refinement was performed with and without B-factor
refinement.
        
        10. Using density modification in SOLOMON, a structure-based
solvent mask in DM or statistical modification in PIRATE fails to
elucidate these additional, significant missing regions (which includes
three helices, 1.5 beta sheets and several loops).  Tellingly, comparing
final models to the original experimentally phased maps shows truncation
of the model at the same places as "truncation" occurs in the electron
density.  During the building process as much care was taken as possible
that the structure was not being built into a "local minimum".
        
        11. phenix.autobuild is able to build a polyalanine model that
covers about 25% of the molecule.
        
        12. The native and derivative datasets scale extremely well
together: they are strongly consistent.  This is often not the case with
twinned crystals.
        
        Any suggestions would be greatly appreciated!
        
        Thanks,
        James
        
        OUTPUT FROM CORRECT IN XDS:
        
        SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION
OF RESOLUTION
         RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR
R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
           LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed
expected                                      Corr
        
             7.02        4906     697       746       93.4%       1.9%
2.2%     4897   77.46     2.0%     1.1%    11%   0.849     447
             5.04        9074    1152      1152      100.0%       2.1%
2.5%     9074   66.65     2.3%     1.4%     1%   0.797     890
             4.13       11742    1445      1446       99.9%       2.3%
2.6%    11742   65.87     2.5%     1.5%    -7%   0.755    1178
             3.59       13929    1685      1685      100.0%       3.4%
3.6%    13929   45.70     3.6%     2.8%     2%   0.763    1428
             3.21       15675    1884      1884      100.0%       6.5%
6.5%    15675   28.42     6.9%     5.7%    -1%   0.811    1618
             2.94       17324    2066      2066      100.0%      14.1%
14.2%    17324   14.14    15.0%    14.5%     0%   0.787    1804
             2.72       18743    2232      2232      100.0%      27.5%
27.7%    18743    7.61    29.3%    29.9%     0%   0.749    1965
             2.55       20107    2388      2388      100.0%      46.1%
45.4%    20107    4.85    49.1%    51.7%    -2%   0.717    2123
             2.40       20056    2466      2545       96.9%      73.6%
72.2%    20023    3.04    78.5%    81.1%     1%   0.707    2144
            total      131556   16015     16144       99.2%       5.4%
5.6%   131514   26.34     5.7%    10.6%     0%   0.758   13597
        
        Attached figures:
        Data scaled in P2, self-rotation function in MOLREP
        Data scaled in P222, self-rotation function in MOLREP
        Cumulative intensity distribution in TRUNCATE

Re: [ccp4bb] Help sought for problem dataset

Reply via email to