Dear Tassos,

     It is unclear whether this thread will be able to resolve your deep 
existential concerns about "what to be", but you do introduce a couple of
interesting points: (1) raw data archiving in areas (of biology) other than
structural biology, and (2) archiving the samples rather than the verbose
data that may have been extracted from them.

     Concerning (1), I am grateful to Peter Keller here in my group for
pointing me, mid-August when we were for the n-th time reviewing the issue
of raw data deposition under discussion in this thread, and its advantages
over only keeping derived data extracted from them, towards the Trace
Archive of DNA sequences. He found an example, at 

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?&cmd=retrieve&val=12345&dopt=trace&size=1&retrieve=Submit

You can check the "Quality Score" box below the trace, and this will refresh
the display to give a visual estimate of the reliability of the sequence.
There is clearly a problem around position 210, that would not have been
adequately dealt with by just retaining the most probable sequence. In this
context, it has been found worthwhile to preserve the raw data, to make it
possible to "audit" derived data against them. This is at least a very
simple example of what you were referring to whan you wrote about the
inadequacy of computational "reduction". In the MX context, this is rather
similar to the contamination of integrated intensities by spots from
parasitic lattices (which would still affect unmerged intensities, by the
way - so upgrading the pdb "structure factor" file to unmerged data would
take care of over-merging, but not of that contamination). 

     Concerning (2) I greatly doubt there would be an equivalent for MX: few
people would have spare crystals to put to one side for a future repeat of a
diffraction experiment (except in the case of lysozyme/insulin/thaumatin!).
I can remember an esteemed colleague arguing 4-5 years ago that if you want
to improve a deposited structure, you could simply repeat the work from
scratch - a sensible position from the philosophical point of view (science
being the art of the repeatable), but far less sensible in conditions of
limited resources, and given also the difficulties of reproducing crystals.
The real-life situation is more a "Carpe diem" one: archive what you have,
as you may never see it again! Otherwise one would easily get drawn into the
same kind of unrealistic expectations as people who get themselves frozen in
liquid N2, with their blood replaced by DMSO, hoping to be brought back to
life some day in the future ;-) .


     With best wishes,
     
          Gerard.

--
On Mon, Oct 31, 2011 at 11:37:47AM +0100, Anastassis Perrakis wrote:
> Dear all,
>
> The discussion about keeping primary data, and what level of data can be 
> considered 'primary', has - rather unsurprisingly - come up also in areas 
> other than structural biology.
> An example is next generation sequencing. A full-dataset is a few tera 
> bytes, but post-processing reduces it to sub-Gb size. However, the 
> post-processed data, as in our case,
> have suffered the inadequacy of computational "reduction" ... At least out 
> institute has decided to create double back-up of the primary data in 
> triplicate. For that reason our facility bought
> three -80 freezers, one on site at the basement, on at the top floor, and 
> one off-site, and they keep the DNA to be sequenced. A sequencing run is 
> already sub-1k$ and it will not become
> more expensive. So, if its important, do it again. Its cheaper and its 
> better.
>
> At first sight, that does not apply to MX. Or does it?
>
> So, maybe the question is not "To archive or not to archive" but "What to 
> archive".
>
> (similarly, it never crossed my mind if I should "be or not be" - I always 
> wondered "what to be")
>
> A.
>
>
> On Oct 30, 2011, at 11:59, Kay Diederichs wrote:
>
>> Am 20:59, schrieb Jrh:
>> ...
>>> So:-  Universities are now establishing their own institutional
>>> repositories, driven largely by Open Access demands of funders. For
>>> these to host raw datasets that underpin publications is a reasonable
>>> role in my view and indeed they already have this category in the
>>> University of Manchester eScholar system, for example.  I am set to
>>> explore locally here whether they would accommodate all our Lab's raw
>>> Xray images datasets per annum that underpin our published crystal
>>> structures.
>>>
>>> It would be helpful if readers of this CCP4bb could kindly also
>>> explore with their own universities if they have such an
>>> institutional repository and if raw data sets could be accommodated.
>>> Please do email me off list with this information if you prefer but
>>> within the CCP4bb is also good.
>>>
>>
>> Dear John,
>>
>> I'm pretty sure that there exists no consistent policy to provide an 
>> "institutional repository" for deposition of scientific data at German 
>> universities or Max-Planck institutes or Helmholtz institutions, at least 
>> I never heard of something like this. More specifically, our University of 
>> Konstanz certainly does not have the infrastructure to provide this.
>>
>> I don't think that Germany is the only country which is the exception to 
>> any rule of availability of "institutional repository" . Rather, I'm 
>> almost amazed that British and American institutions seem to support this.
>>
>> Thus I suggest to not focus exclusively on official institutional 
>> repositories, but to explore alternatives: distributed filestores like 
>> Google's BigTable, Bittorrent or others might be just as suitable - check 
>> out http://en.wikipedia.org/wiki/Distributed_data_store. I guess that any 
>> crystallographic lab could easily sacrifice/donate a TB of storage for the 
>> purposes of this project in 2011 (and maybe 2 TB in 2012, 3 in 2013, ...), 
>> but clearly the level of work to set this up should be kept as low as 
>> possible (a bittorrent daemon seems simple enough).
>>
>> Just my 2 cents,
>>
>> Kay
>>
>
> P please don't print this e-mail unless you really need to
> Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
> Department of Biochemistry (B8)
> Netherlands Cancer Institute,
> Dept. B8, 1066 CX Amsterdam, The Netherlands
> Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791
>

-- 

     ===============================================================
     *                                                             *
     * Gerard Bricogne                     [email protected]  *
     *                                                             *
     * Global Phasing Ltd.                                         *
     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
     *                                                             *
     ===============================================================

Reply via email to