I guess my hunch would be that there is some sot of order-disorder
problem; either twinning or crystal dislocation.
Some ideas -
reindex the PG222 data set h/2,k,l so that a~=b and test that data for
twinning - you can just run truncate on the output Is and see the
moments and cumulative intensity plots.
Try solving the structure with that data set to find 2 molecules.
The new SG is likely to be P2 212 21 (or possibly P 21 21 21. Look at
the new set of h' 00 to see if there are any absences)
It might suggest something..
Eleanor
James Irving wrote:
Dear cpp4bb,
I'm wrestling with a crystal form that is causing us a great deal of
trouble, and I was hoping for some general suggestions that might get us
working in the right direction. The protein in question is 40kDa. Details
are provided below. The summary is that at the model building/refinement
stage this data is behaving as though it is twinned, or merged in the wrong
space group, or there is some other fundamental anomaly that needs to be
accounted for, but the tests we've performed indicate that it is solved, not
twinned, in the correct space group.
Here are some details:
1. It integrates and merges well in P2(1)2(1)2(1) without any trouble (see
table pasted at bottom). We've collected two datasets, around 2.5-2.6A
each, one is of a selenomethionine derivative. To get as strong data as
possible for MIRAS we collected ~400 deg of the derivative, and around 205
deg of the native. Scaling and merging in XDS (although we've used
mosflm/scala as well) gives rather good stats at the low resolution (rmerge
1-2% <3.7A) and deteriorate quite markedly in the high resolution bin (~50%
at 2.55-2.7A), with I/sigI of 60 for the former and 3.0 for the latter.
Systematic absences very strongly support a P212121 space group.
2. There is no visible sign that this is part of a superlattice, or
comprises a superlattice. Virtually all spots are accounted for during
integration. By eye there are no systematically weak and strong
reflections. Playing with the minimum I/sigI at the indexing stage doesn't
do anything, nor does deleting strong reflections and indexing only with
weak ones, nor does indexing using only high, or only low, resolution spots.
3. Unit cell: a=42.6 b= 85.3 c=108.5 90 90 90. a is almost exactly 2*b.
4. Wilson plot looks normal. There is no detected pseudotranslation.
Cumulative
intensity distribution in truncate appears *very slightly* sigmoidal. Is it
twinned? More on that in a moment...
5. There is a reasonably close homolog of this protein that has been
crystallised (~50% identity) - we were expecting an easy MR solution.
Phaser gives Z-score in the rotation function of ~17, and ~11 in the
translation function for a single molecule in the AU, as expected. 2FoFc &
FoFc maps look absolutely rubbish, very much worse than would be expected
for this protein at this resolution with this solution. Correction for
anisotropy doesn't improve maps much here or at any other stage in the
building/refinement.
6. Scaling and merging in P21 on the off-chance of perfect twinning or
pseudosymmetry gives exactly two solutions with very good Z-scores. Maps
still look rubbish. Phenix.xtriage, as would be expected, suggests a
twinning operator with alpha ~0.5 that is identical to a crystallographic
operator in P212121. Rigid body refinement using phenix.refine and this
twinning operator gives Rfactor/Rfree that are low but again maps are
uninterpretable. Conclusion: this isn't perfectly pseudomeroherally twinned
in P21.
7. Went back to basics in P1. Same deal as P21.
8. All other enantiomorphs in monoclinic and orthorhombic give significantly
lower, and poorly distinguished translation Z-scores in MR.
9. The selenomethionine dataset was solved using MIRAS in SHARP/autoSHARP.
The experimentally phased electron density yields contiguous tracts of
density in the right place, unbiased density indicates a good solution.
Model building was conducted in P212121, initially into the experimental
maps and later with refinement in refmac using HL-coefficient-based
restraints. In some regions, sequence can easily be deduced from clean
electron density (for the resolution). In other regions, side chains are
missing, and in others, density is completely inconsistent with the
connectivity of the chains and highly conserved structural elements. As
occurs sometimes with twinned data, many loops cannot be modelled at all,
and the Rfree does not drop below 0.41 with an Rfactor of 0.34. The result
is a model that is about 60-70% complete. Refinement was performed with and
without B-factor refinement.
10. Using density modification in SOLOMON, a structure-based solvent mask in
DM or statistical modification in PIRATE fails to elucidate these
additional, significant missing regions (which includes three helices, 1.5
beta sheets and several loops). Tellingly, comparing final models to the
original experimentally phased maps shows truncation of the model at the
same places as "truncation" occurs in the electron density. During the
building process as much care was taken as possible that the structure was
not being built into a "local minimum".
11. phenix.autobuild is able to build a polyalanine model that covers about
25% of the molecule.
12. The native and derivative datasets scale extremely well together: they
are strongly consistent. This is often not the case with twinned crystals.
Any suggestions would be greatly appreciated!
Thanks,
James
OUTPUT FROM CORRECT IN XDS:
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR
COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed
expected Corr
7.02 4906 697 746 93.4% 1.9% 2.2%
4897 77.46 2.0% 1.1% 11% 0.849 447
5.04 9074 1152 1152 100.0% 2.1% 2.5%
9074 66.65 2.3% 1.4% 1% 0.797 890
4.13 11742 1445 1446 99.9% 2.3% 2.6%
11742 65.87 2.5% 1.5% -7% 0.755 1178
3.59 13929 1685 1685 100.0% 3.4% 3.6%
13929 45.70 3.6% 2.8% 2% 0.763 1428
3.21 15675 1884 1884 100.0% 6.5% 6.5%
15675 28.42 6.9% 5.7% -1% 0.811 1618
2.94 17324 2066 2066 100.0% 14.1% 14.2%
17324 14.14 15.0% 14.5% 0% 0.787 1804
2.72 18743 2232 2232 100.0% 27.5% 27.7%
18743 7.61 29.3% 29.9% 0% 0.749 1965
2.55 20107 2388 2388 100.0% 46.1% 45.4%
20107 4.85 49.1% 51.7% -2% 0.717 2123
2.40 20056 2466 2545 96.9% 73.6% 72.2%
20023 3.04 78.5% 81.1% 1% 0.707 2144
total 131556 16015 16144 99.2% 5.4% 5.6%
131514 26.34 5.7% 10.6% 0% 0.758 13597
Attached figures:
Data scaled in P2, self-rotation function in MOLREP
Data scaled in P222, self-rotation function in MOLREP
Cumulative intensity distribution in TRUNCATE
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------