Dear John,
Thank you for a very informative letter about the IUCr activities towards 
archiving the experimental data. I feel that I did not explain myself properly. 
I do not object archiving the raw data, I just believe that current methodology 
of validating data at PDB is insufficiently robust and requires a modification. 
Implementation of the raw image storage and validation will take a considerable 
time, while the recent incidents of a presumable data frauds demonstrate that 
the issue is urgent. Moreover, presenting the calculated structural factors in 
place of the experimental data is not the only abuse that the current 
validation procedure encourages to do. There might be more numerous occurances 
of data "massaging" like overestimation of the resolution or data quality, the 
system does not allow to verify them. IUCr and PDB follows the American 
taxation policy, where the responsibility for a fraud is placed on people, and 
the agency does not take sufficient actions to prevent it. I believe it is 
inefficient and inhumane. Making a routine  check of submitted data at a bit 
lower level would reduce a temptation to overestimate the unclearly defined 
quality statistics and make the model fabrication more difficult to accomplish. 
Many people do it unknowingly, and catching them afterwards makes no good.

I suggested to turn the current incidence, which might be too complex for 
burning heretics, into something productive that is done as soon as possible, 
something that will prevent fraud from occurring.

Since my persistent "trolling" at ccp4bb did not take any effect (until now), I 
wrote a "bad-English" letter to the PDB administration, encouraging them to 
take urgent actions. Those who are willing to count grammar mistakes in it can 
reading the message below.

With best regards,
Alexander Aleshin, staff scientist
Sanford-Burnham Medical Research Institute 
10901 North Torrey Pines Road
La Jolla, California 92037

Dear PDB administrators;

I am wringing to you regarding the recently publicized story about submission 
of calculated structural factors to the PDB entry 3k79 
(http://journals.iucr.org/f/issues/2012/04/00/issconts.html). This presumable 
fraud (or a mistake) occurred just several years after another, more massive 
fabrication of PDB structures (Acta Cryst. (2010). D66, 115) that affected many 
scientists including myself. The repetitiveness of these events indicates that 
the current mechanism of structure validation by PDB is not sufficiently 
robust. Moreover, it is completely incapable of detecting smaller mischief such 
as overestimation of the data resolution and quality.

            There are two approaches to handling fraud problems: (1) raising 
policing and punishment, or (2) making a fraud too difficult to implement. 
Obviously, the second approach is more humane and efficient.

            This issue has been discussed on several occasions by the ccp4bb 
community, and some members began promoting the idea of submitting raw 
crystallographic images as a fraud repellent. However, this validation approach 
is not easy and cheap, moreover, it requires a considerable manpower to conduct 
it on a day-to-day basis. Indeed, indexing data sets is sometimes a nontrivial 
problem and cannot be accomplished automatically. For this reason, submitting 
the indexed and partially integrated data (such as .x files from HKL2000 or the 
output.mtz file from Mosfilm) appears as a cheaper substitute to the image 
storing/validating.

            Analysis of the partially integrated data provides almost same 
means to the fraud prevention as the images.  Indeed, the observed cases of 
data fraud suggest that they would likely be attempted by a 
biochemist-crystallographer, who is insufficiently educated to fabricate the 
partially processed data. A method developer, on contrary, does not have a 
reasonable incentive to forge a particular structure, unless he teams up with a 
similarly minded biologist. But the latter scenario is very improbable and has 
not been detected yet.

            The most valuable benefit in using the partially processed data as 
a validation tool would be the standardization of definition for the data 
resolution and detection of inappropriate massaging of experimental data.

            Implementation of this approach requires minuscule adaptation of 
the current system, which most of practicing crystallographers would accept (in 
my humble opinion). The requirement to the data storage would be only ~1000 
fold higher than the current one, and transferring the new data to PDB could be 
still done over the Internet. Moreover, storing the raw data is not required 
after the validation is done.

            A program such as Scala of CCP4 could be easily adopted to process 
the validation data and compare them with a conventional set of structural 
factors.  Precise consistency of the two sets is not necessary. They only need 
to agree within statistically meaningful boundaries, and if they don’t, the 
author could be asked to provide a detailed algorithm of his/her data 
processing. Finally, the standardized method could be used to determine the 
resolution of submitted data, which could be reported together with values 
provided by the author.

            To implement this validation approach, PDB would need to raise some 
funds, but small enough to be sacrificed out of our common feeder. Anyway, it 
is easier and cheaper than the raw image approach and can serve as a basis for 
a transfer to it in a future (if required). Since it appears to be a joined 
project to CCP4 and PDB, I ask all crystallographers, who feel an urgent need 
for upgrading the structure validation protocol, to encourage them to consider 
this issue as quickly as possible. People who commit crimes are not always bad 
people; lets show our governments a good way to handle this problem.

 

Sincerely,

Alexander Aleshin, Staff Scientist

Sanford-Burnham Institute for Medical Research,

La Jolla, CA, USA.

 

 

Reply via email to