James, when you tried P21, did you try all possibilities, i.e. P 1 21 1,
P 21 1 1, and P 1 1 21? Also, try the P2 versions thereof, and all eight
orthorhombic possibilities. I've had several crystals where the syst.
absences were misleading; pseudo-crystallographic NCS could account for
them. Dave
David Borhani, Ph.D.
D. E. Shaw Research, LLC
120 West Forty-Fifth Street, 39th Floor
New York, NY 10036
[EMAIL PROTECTED]
212-478-0698
http://www.deshawresearch.com <http://www.deshawresearch.com/>
________________________________
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On
Behalf Of James Irving
Sent: Tuesday, November 25, 2008 1:23 AM
To: [email protected]
Subject: [ccp4bb] Help sought for problem dataset
Dear cpp4bb,
I'm wrestling with a crystal form that is causing us a great
deal of trouble, and I was hoping for some general suggestions that
might get us working in the right direction. The protein in question is
40kDa. Details are provided below. The summary is that at the model
building/refinement stage this data is behaving as though it is twinned,
or merged in the wrong space group, or there is some other fundamental
anomaly that needs to be accounted for, but the tests we've performed
indicate that it is solved, not twinned, in the correct space group.
Here are some details:
1. It integrates and merges well in P2(1)2(1)2(1) without any
trouble (see table pasted at bottom). We've collected two datasets,
around 2.5-2.6A each, one is of a selenomethionine derivative. To get
as strong data as possible for MIRAS we collected ~400 deg of the
derivative, and around 205 deg of the native. Scaling and merging in
XDS (although we've used mosflm/scala as well) gives rather good stats
at the low resolution (rmerge 1-2% <3.7A) and deteriorate quite markedly
in the high resolution bin (~50% at 2.55-2.7A), with I/sigI of 60 for
the former and 3.0 for the latter. Systematic absences very strongly
support a P212121 space group.
2. There is no visible sign that this is part of a superlattice,
or comprises a superlattice. Virtually all spots are accounted for
during integration. By eye there are no systematically weak and strong
reflections. Playing with the minimum I/sigI at the indexing stage
doesn't do anything, nor does deleting strong reflections and indexing
only with weak ones, nor does indexing using only high, or only low,
resolution spots.
3. Unit cell: a=42.6 b= 85.3 c=108.5 90 90 90. a is almost
exactly 2*b.
4. Wilson plot looks normal. There is no detected
pseudotranslation. Cumulative intensity distribution in truncate
appears *very slightly* sigmoidal. Is it twinned? More on that in a
moment...
5. There is a reasonably close homolog of this protein that has
been crystallised (~50% identity) - we were expecting an easy MR
solution. Phaser gives Z-score in the rotation function of ~17, and ~11
in the translation function for a single molecule in the AU, as
expected. 2FoFc & FoFc maps look absolutely rubbish, very much worse
than would be expected for this protein at this resolution with this
solution. Correction for anisotropy doesn't improve maps much here or
at any other stage in the building/refinement.
6. Scaling and merging in P21 on the off-chance of perfect
twinning or pseudosymmetry gives exactly two solutions with very good
Z-scores. Maps still look rubbish. Phenix.xtriage, as would be
expected, suggests a twinning operator with alpha ~0.5 that is identical
to a crystallographic operator in P212121. Rigid body refinement using
phenix.refine and this twinning operator gives Rfactor/Rfree that are
low but again maps are uninterpretable. Conclusion: this isn't
perfectly pseudomeroherally twinned in P21.
7. Went back to basics in P1. Same deal as P21.
8. All other enantiomorphs in monoclinic and orthorhombic give
significantly lower, and poorly distinguished translation Z-scores in
MR.
9. The selenomethionine dataset was solved using MIRAS in
SHARP/autoSHARP. The experimentally phased electron density yields
contiguous tracts of density in the right place, unbiased density
indicates a good solution. Model building was conducted in P212121,
initially into the experimental maps and later with refinement in refmac
using HL-coefficient-based restraints. In some regions, sequence can
easily be deduced from clean electron density (for the resolution). In
other regions, side chains are missing, and in others, density is
completely inconsistent with the connectivity of the chains and highly
conserved structural elements. As occurs sometimes with twinned data,
many loops cannot be modelled at all, and the Rfree does not drop below
0.41 with an Rfactor of 0.34. The result is a model that is about
60-70% complete. Refinement was performed with and without B-factor
refinement.
10. Using density modification in SOLOMON, a structure-based
solvent mask in DM or statistical modification in PIRATE fails to
elucidate these additional, significant missing regions (which includes
three helices, 1.5 beta sheets and several loops). Tellingly, comparing
final models to the original experimentally phased maps shows truncation
of the model at the same places as "truncation" occurs in the electron
density. During the building process as much care was taken as possible
that the structure was not being built into a "local minimum".
11. phenix.autobuild is able to build a polyalanine model that
covers about 25% of the molecule.
12. The native and derivative datasets scale extremely well
together: they are strongly consistent. This is often not the case with
twinned crystals.
Any suggestions would be greatly appreciated!
Thanks,
James
OUTPUT FROM CORRECT IN XDS:
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION
OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR
R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed
expected Corr
7.02 4906 697 746 93.4% 1.9%
2.2% 4897 77.46 2.0% 1.1% 11% 0.849 447
5.04 9074 1152 1152 100.0% 2.1%
2.5% 9074 66.65 2.3% 1.4% 1% 0.797 890
4.13 11742 1445 1446 99.9% 2.3%
2.6% 11742 65.87 2.5% 1.5% -7% 0.755 1178
3.59 13929 1685 1685 100.0% 3.4%
3.6% 13929 45.70 3.6% 2.8% 2% 0.763 1428
3.21 15675 1884 1884 100.0% 6.5%
6.5% 15675 28.42 6.9% 5.7% -1% 0.811 1618
2.94 17324 2066 2066 100.0% 14.1%
14.2% 17324 14.14 15.0% 14.5% 0% 0.787 1804
2.72 18743 2232 2232 100.0% 27.5%
27.7% 18743 7.61 29.3% 29.9% 0% 0.749 1965
2.55 20107 2388 2388 100.0% 46.1%
45.4% 20107 4.85 49.1% 51.7% -2% 0.717 2123
2.40 20056 2466 2545 96.9% 73.6%
72.2% 20023 3.04 78.5% 81.1% 1% 0.707 2144
total 131556 16015 16144 99.2% 5.4%
5.6% 131514 26.34 5.7% 10.6% 0% 0.758 13597
Attached figures:
Data scaled in P2, self-rotation function in MOLREP
Data scaled in P222, self-rotation function in MOLREP
Cumulative intensity distribution in TRUNCATE