*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Dear all,
Thank you for the answers concerning my recent posting about how to deal
with free R reflections in the case of pseudo-I222 symmetry. This was my
original question:
We have solved the structure of a protein that crystallised in space group
I222 with 1 molecule per asymmetric unit. After refining the structure to
3.5 A, we obtained a different crystal form that diffracts to higher
resolution; in this case, the space group appears to be P21212, with the
same unit cell as the first type of crystals and 2 molecules per asymmetric
unit. The packing between the two molecules in the P21212 asymmetric unit
is
identical to that of symmetry-related molecules in the I222 crystals,
suggesting that the space group of crystal form 1 could also be P21212,
with
non-crystallographic symmetry giving rise to strong pseudo-I222 symmetry at
low resolution.
Since we have already refined our model against the low resolution data
processed in I222, what would be the best way to define a new free R set in
order to continue refinement against the higher resolution P21212 data? If
our interpretation is correct, I would think that using a completely new
set
would introduce some bias; on the other hand, is there any way to somehow
take into account the pseudo-symmetry we observe?
and these are the answers (in order of appearance):
Gerard DVD Kleywegt <[EMAIL PROTECTED]>:
--------------------------------------------
hi,
We have solved the structure of a protein that crystallised in space group
I222 with 1 molecule per asymmetric unit. After refining the structure to
3.5 A, we obtained a different crystal form that diffracts to higher
resolution; in this case, the space group appears to be P21212, with the
same unit cell as the first type of crystals and 2 molecules per asymmetric
unit. The packing between the two molecules in the P21212 asymmetric unit
is
identical to that of symmetry-related molecules in the I222 crystals,
suggesting that the space group of crystal form 1 could also be P21212,
with
non-crystallographic symmetry giving rise to strong pseudo-I222 symmetry at
low resolution.
this case is very similar to that of cellobiohydrolase I. the initial
structure was determined in P21212 with translational NCS of "1/2, 1/2,
almost 1/2" and cell axes 84.0, 86.2, 111.8 (pdb entry 1CEL). later, some
complexes crystallised in the higher symmetry spacegroup I222 with cell
constants 84.0, 84.1, 111.5 (e.g., 3CEL). the translational NCS complicated
the spacegroup determination and also made that averaging didn't improve the
maps all that much (as explained in Structure, 5, 1557-1569 (1997)).
Since we have already refined our model against the low resolution data
processed in I222, what would be the best way to define a new free R set in
order to continue refinement against the higher resolution P21212 data? If
our interpretation is correct, I would think that using a completely new
set
would introduce some bias; on the other hand, is there any way to somehow
take into account the pseudo-symmetry we observe?
since the higher pseudo-symmetry is purely translational, there is no danger
of having pairs of "pseudo-symmetry-related" reflections divided into
different pots (work or test set). so you could simply transfer the test
flags from the I222 to the P21212 dataset (e.g., with the RFree TRansfer
command in DATAMAN - http://xray.bmc.uu.se/usf/dataman_man.html#S63 - there
are also other tools in DATAMAN to adjust the test set, e.g. if the two
datasets have different resolution limits). to be on the safe side, you
could also do some high-temperature simulated annealing in CNS to uncouple
work and test data.
on the other hand, transferring the test set means that your test set will
not contain any of the reflections that are systematically absent in I222
(and "systematically weak" in P21212). therefore, i would probably select a
whole new random test set and do a high-temperature simulated annealing run
(e.g., a 5000 or 10000 K slow-cool using torsion dynamics).
hope this helps,
--gerard
Bart Hazes <[EMAIL PROTECTED]>:
------------------------------------
Hi Zorg,
In these types of cases you normally want to ensure that all Rfree
reflections used in the original crystal form are also used for Rfree in the
new crystal form. In your case that would mean that you just keep your
original Rfree set. However, that way your Rfree set only samples the strong
h+k+l=2n subset of the lattice which will give a very distorted comparison
with Rwork which will include also all the systematically weakened
h+k+l=2n+1 reflections. So my suggestion would be to keep half of your
original Rfree set and select a new set of Rfree reflections from the
systematically weakened part of your data set.
I would do this in sftools. It clearly is a non-trivial use of the program
which illustrates how you can combine several of its flexible commands to do
complex and unusual tasks. I'm doing this by heart so there may be a mistake
but I hope you get the idea. Let me know if you run into problems.
1 read P21212.mtz
2 read I222.mtz col rfree
3 calc r col random = ran_u
4 select col rfree = 0
5 select col random < 0.5
6 calc col rfree = 1
7 select all
8 set spacegroup I222
9 select sysabs
10 rfree 500
11 select all
12 delete col random
13 set spacegroup P21212
14 write P21212new.mtz
1 read the data for your P21212 set
2 read only the rfree column of the I222 mtz (assuming column label
for this is "rfree"
3 create a new column named "random" with uniform random numbers in the
range of 0 to 1
4 select reflections with a 0 in the rfree column (e.g. your rfree set)
5 select only the subset with random number less than 0.5, this should
give you a random selection of half your original rfree set
6 set the flag of this subset to 1 (e.g. put them in the working set)
7 reset the selection to include all reflections
8 change space group temporarily to I222
9 select all reflections that should be systematically absent in I222
10 select 500 rfree reflections in the h+k+l=2n+1 part of the data (set
500 to whatever number of reflections you want)
11 reset the selection to include all reflections
12 remove the temporary column
13 reset the space group to the correct setting
14 write out the new mtz with the desire rfree set
The alternative solution is to pick an entirely new Rfree set and shake the
bias out of your model. That would definately be easier to explain in the
methods section :)
Bart
Phoebe Rice <[EMAIL PROTECTED]>:
---------------------------------
I'm guessing this is translational pseudosymmetry, where spots that should
be fully absent in I222 are faintly present in the P212121 data? In that
case, I think the indices of the individual spots should be the same either
way, and you should just keep the old Rfree indices and expand the set from
there.
Thought for the community as a whole - in cases where some fraction of the
spots is systematically weak, should the Rfree set be chosen only from the
strong ones, since the translational symmetry and not the structural details
is dictating the overall magnitude of the weak ones?
Phoebe Rice
Joe Becker <[EMAIL PROTECTED]>:
-------------------------------------
I222 and P2(1)2(1)2 are closely related. Change one of the
crystallographic two-folds into a non-crystallographic two-fold and
Bob's your uncle.
Demetallized Concanavalin A crystallizes in P2(1)2(1)2. Add cations to
the crystals and they'll change into the same form (I222) that you get
by crystallizing the metallo-protein directly. See Becker et al. (1975)
JBC 250:1513 and Reeke et al. (1978) PNAS 75:2286.
Joe Becker
Merck Research Labs
Ian Tickle <[EMAIL PROTECTED]>:
---------------------------------------------
Yes I think this would be legitimate provided any comparison between the
Rfree and Rwork was done with the value of Rwork based only on the
subset of strong refls. It clearly would be wrong to compare Rfree
based only on strong refls with Rwork based on all refls, i.e. incl the
weak ones. The advantage would be that Rfree would not be sensitive to
the exact ratio of the number of strong to weak refls that happened to
be in the test set. So the SU(Rfree) would be reduced thus making any
statistical tests based on the value of Rfree-Rwork (or better
Rfree/Rwork) much more reliable.
-- Ian
Vaheh Oganesyan <[EMAIL PROTECTED]>:
-------------------------------------------
But that may mean that there will be no reflections marked
for r-free in highest resolution shell. Isn't random
selection random enough?
Ian Tickle <[EMAIL PROTECTED]>:
---------------------------------------------
I don't see how that situation could ever arise in practice - the
proportions of reflections in the strong and weak subsets must be nearly
equal throughout reciprocal space, since they are separated by only 1 in
one of the indices, so the probability of randomly selecting zero refls
for the test set from the strong subset within a resolution shell must
be vanishingly small.
The point I was making (and I think Phoebe was also implicitly making)
was that because of the inevitable big difference in R factors between
the strong & weak subsets, any variation in the ratio of the number of
strong and weak refls due to random sampling error would magnify the
error in the estimate of Rfree and thus make it less useful as a
statistic. Also as Phoebe pointed out most of the structural
information is contained in the strong subset, the weak subset merely
contains information about minor deviations (e.g. differences in
side-chain conformations) from the pseudo-translational symmetry (that's
why they're weak!). Rfree is really only useful as an indicator of
major errors in the structure (such as mis-tracing the main chain!), and
such information will be contained only in the strong subset.
Presumably if you have made such an error, the error itself will just be
duplicated by the translational pseudos-symmetry so the weak subset will
contain no information about the error.
-- Ian
Eleanor Dodson <[EMAIL PROTECTED]>:
--------------------------------------
I think you should assign the Free R set randomly, including weak and stong
observations.
If you want to transfer your I222 Free R set to the new P21212 set, when you
use the task import scaled data , or run SCALA you can ask to have your Free
R transfered from another data set. Then the procedure will a) see what %
you had assigned originally b) generate the same % of Free R assignments for
all newly included hkl. This will work providing you have the same indexing
convention for the I222 and the P2i2i2i set .
You may need to reindex one to get the axes aligned..
Eleanor
Thanks again to all the people that answered! Best regards,
Zorg
_________________________________________________________________
Share your latest news with your friends with the Windows Live Spaces
friends module.
http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mk