*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Following up on the discussion on the ABC structure retractions a couple
weeks ago, one area where tools are lacking is the ability to do easy
"data" validation. There are excellent tools for "structure"
validation, but much less to compare and validate data files.
One common source of errors is changes made in reflection or coordinate
files, either by hand or by in-house programs that may not be rigorous
enough to catch all outliers (overflows, format changes, etc.). For
example, only recently, someone in my lab just manually edited out a hkl
format file to change a few reflections from the exponential format into
the standard F format, so that it could be converted into the mtz format.
Format conversions (hkl to mtz or vice-versa) or simple manual edits to
coordinate files are very common, and are fertile places for mistakes to
creep in. Once such mistakes are made, they are not often easy to catch
since there is no easy way to compare files.
I request simple "diff" like utilities for reflection files (sca, hkl,
cv, mtz) as well as coordinate files, from the crystallography software
developers. Thees utilities should be sophisticated enough to realize
differences due to truncations, rounding errors, overflows, sigma
cut-offs, etc. but give simple diagnostic reports - just a few lines
that are easy enough for students to digest, apart from detailed log files.
Much of this functionality already exists in programs like sftools,
molman2, etc. but not in any straightforward way.
These could go something like this:
reflection_diff A.hkl B.mtz
which would yield a report like:
*** A.hkl and B.mtz are identical (up to rounding errors of 0.01), but
B.mtz includes PHWT and FOM data
or
*** A.hkl and B.mtz are identical, but B.mtz is missing all reflections
with F < 0.0 and F > 99999.0
or
*** C.mtz and B.mtz are identical (up to rounding errors), but the freeR
flags have been changed in the resolution shell (x to y)
or
*** A.hkl and B.cv are identical in reflection data, but cell dimensions
have changed more than 0.x rounding error.
or
*** A.hkl and B.cv are identical, but B.cv is missing 26 reflections
(see xxxx.log)
or
*** A.mtz and B.mtz are similar (identical cell dimensions, but F's vary
by 2.3% which is greater than rounding error)
and so on.
Similarly, there could be a coordinate_diff that would look only at
ATOM/HETATM lines and generate reports like:
*** A.pdb and B.pdb are identical, B.pdb has two alternate conformations
for residue 123
or
*** A.pdb and B.pdb vary by 0.8 Å (RMSD over identical segments), and
B.pdb has one additional segment with 4 residues ...
and so on.
--
Arun Malhotra Phone: (305) 243-2826
Associate Professor Lab: (305) 243-2890
Dept. of Biochemistry & Molecular Biology Fax: (305) 243-3955
University of Miami School of Medicine
PO Box 016129 E-Mail: [EMAIL PROTECTED]
Miami, FL 33101 Web: http://structure.med.miami.edu