Re: [ccp4bb] database-assisted data archive

James Holton Wed, 18 Aug 2010 08:54:36 -0700

There is an image archiving system called TARDIS (http://tardis.edu.au/)that sounds more-or-less exactly like what you describe.I agree that it would be "nice" if you can get your synchrotron to do itfor you, but since every single beamline and home-source setup in theworld has already been providing you with a "database" that is morecommonly called the "image header", I don't think it is too hard toimagine how accurate the data in your "database" is going to be.

If I may interject my two cents, I have found that when a user is askedto fill out a form, compliance is inversely proportional to the numberof fields on the form. But far more important than that: if you askthem to answer a question that they simply don't know the answer to,they will likely skip the whole thing. An excellent example (I think)is asking for the space group BEFORE they have even taken their firstsnapshot of a brand new crystal. This datum is simply not known untilAFTER the structure is solved! For example, is it P41 or P43? Youdon't "really" know that until after you see a helix in the map. Whatis the molecular weight? That depends on whether or not it is acomplex. (if I had a nickel for every user who was certain they had aprotein-DNA complex with a "very low solvent content", I would be quiterich).

All that said, I don't think it is unreasonable to expect an imageheader (or any other database) to contain motor positions, detectortype, wavelength, beam center etc. Clearly this is not always the case,and this problem still needs a lot of work, but my point is that weshould try to write down things that we "really know" (observations) andnot try to muddle the database with derived quantities (interpretations).

When it comes to what you "really know" about the sample, all you canrealistically hope to be sure of is the list of chemicals that went intothe drop: macromolecule sequence, salts, PEGs, and their respectiveconcentrations. Sometimes you don't even kow that! (i.e. proteolysis).However, the macromolecule sequence is INCREDIBLY useful for deriving(or at least guessing) a great many other things (such as the molecularweight, solvent content, number of heavy atom sites). The list of saltsis also absolutely critical for doing radiation damage predictions.So, as my rant comes to an end, I would strongly suggest focusing ontrying to capture the important things that we actually do know, ratherthan confusing our poor users further by asking them to write down a lotof things that they don't.


-James Holton
MAD Scientist

Andreas Förster wrote:

Dear all,
going through some previous lab member's data and trying to make senseof it, I was wondering what kind of solutions exist to simply thearchiving and retrieval process.
In particular, what I have in mind is a web interface that allows auser who has just returned from the synchrotron or the in-housedetector to fill in a few boxes (user, name of protein, mutant, lightsource, quality of data, number of frames, status of project, etc) andthen upload his data from the USB stick, portable hard drive or remotestorage.
The database application would put the data in a safe place (some fileserver that's periodically backed up) and let users browse through allthe collected data of the lab with minimal effort later.
I doesn't seem too hard to implement this, which is why I'm asking ifanyone has done so already.
Thanks.


Andreas

Re: [ccp4bb] database-assisted data archive

Reply via email to