Dear Colleagues,

The issue Harry is describing, of people writing multiple variations of "image formats" even though all of them are imgCIF is not really a problems with the images themselves. Rather it is a lack of agreement on the metadata to go with the images. This is similar to the problem of lack of consistency in REMARKS for early PDB data sets, which eventually required the adoption of standardized REMARKS and reprocessing of almost all data sets. I don't think it would have been easier to reprocess those data sets if the original data sets had also had their coordinates and sequences recorded with wide variations in formats.

The advantage of using imgCIF for an archive is not that it would force everybody to to their experiments using precisely the same format, but that, because it is capable of faithfully representing all the wide variations in current formats, it would allow what we now have to be captured and preserved and, when someone needed a dataset back, to be recast in an format appropriate to the use.

Think of it as that little figure-8 plug and socket we are able to use to adapt our power cords for travel around the world. There are other possible hub format (NeXus, DICOM, etc.), but the sensisble thing for an archive is to choose one of them for internal use, just as the PDB uses a variation on mmCIF for its internal use to allow it to easily deliver valid PDB, CIF and XML versions of sets of coordinates. For an archive, the advantages of using imgCIF internally, no matter which of the more than 200 current formats were to be used at beam lines and in labs, is that it would not be necessary to discard any of the metadata people provided and it could be made to interoperate easily with the systems used internally by the PDB for coordinate data sets.

For many of the formats in current use, there is no place to store some of the information people provide and translation to other formats can sometimes be much more difficult than one might expect unless additional metadata is provided. Even such obvious things as image orientations are sometimes carried separately from the images themselves and can easily get lost.

Don't let the perfect be the enemy of the good. Archiving images in a common format, such as imgCIF, or, if you prefer, say, in the NeXus transliteration of imgCIF, would help to make some very useful information accessible for future use. It may not be a perfect solution, but it is a good one.

This is a good time to start a major crystallogrpahic image archiving effort. Money may well be available now that will not be avialable six month from now, and we have good, if not perfect, solutions available for many, if not all, of the technical issues involved. Is it really wise to let this opportunity pass us by?

  Regards,
    Herbert
=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 y...@dowling.edu
=====================================================

On Mon, 16 Mar 2009, Harry Powell wrote:

Hi

I'm afraid the adoption of imgCIF (or CBF, its useful binary equivalent) doesn't help a lot - I know of three different manufacturers of detectors who, between them, write out four different image formats, all of which apparently conform to the agreed IUCr imgCIF standard. Each manufacturer has its own good and valid reasons for doing this. It's actually less work for me as a developer of integration software to write new code to incorporate a new format than to make sure I can read all the different imgCIFs properly.


On 16 Mar 2009, at 09:32, Eleanor Dodson wrote:

The deposition of images would be possible providing some consistent imagecif format was agreed. This would of course be of great use to developers for certain pathological cases, but not I suspect much value to the user community - I down load structure factors all the time for test purposes but I probably would not bother to go through the data processing, and unless there were extensive notes associated with each set of images I suspect it would be hard to reproduce sensible results.

The research council policy in the UK is that original data is meant to be archived for publicly funded projects. Maybe someone should test the reality of this by asking the PI for the data sets?
Eleanor


Garib Murshudov wrote:
Dear Gerard and all MX crystallographers

As I see there are two problems.
1) Minor problem: Sanity, semantic and other checks for currently available data. It should not be difficult to do. Things like I/sigma, some statistical analysis expected vs "observed" statistical behaviour should sort out many of these problems (Eleanor mentioned some and they can be used). I do not think that depositors should be blamed for mistakes. They are doing their best to produce and deposit. There should be a proper mechanism to reduce the number of mistakes.
You should agree that situation is now much better than few years.

2) A fundamental problem: What are observed data? I agree with you (Gerard) that images are only true observations. All others (intensities, amplitudes etc) have undergone some processing using some assumptions and they cannot be considered as true observations. The dataprocessing is irreversible process. I hope your effort will be supported by community. I personally get excited with the idea that images may be available. There are exciting possibilities. For example modular crystals, OD, twin in general, space group uncertaintly cannot be truly modeled without images (it does not mean refinement against images). Radiation damage is another example where after processing and merging information is lost and cannot be recovered fully. You can extend the list where images would be very helpful.

I do not know any reason (apart from technical one - size of files) why images should not be deposited and archived. I think this problem is very important.

regards
Garib


On 12 Mar 2009, at 14:03, Gerard Bricogne wrote:

Dear Eleanor,

  That is a useful suggestion, but in the case of 3ftt it would not have
helped: the amplitudes would have looked as healthy as can be (they were
calculated!), and it was the associated Sigmas that had absurd values, being
in fact phases in degrees. A sanity check on some (recalculated) I/sig(I)
statistics could have detected that something was fishy.

  Looking forward to the archiving of the REAL data ... i.e. the images.
Using any other form of "data" is like having to eat out of someone else's
dirty plate!


  With best wishes,

       Gerard.

--
On Thu, Mar 12, 2009 at 09:22:26AM +0000, Eleanor Dodson wrote:
It would be possible for the deposition sites to run a few simple tests to
at least find cases where intensities are labelled as amplitudes or vice
versa - the truncate plots of moments and cumulative intensities at least
would show something was wrong.

Eleanor



--

  ===============================================================
  *                                                             *
  * Gerard Bricogne                     g...@globalphasing.com  *
  *                                                             *
  * Global Phasing Ltd.                                         *
  * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
  * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
  *                                                             *
  ===============================================================




Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH

Reply via email to