Dear Julien,

On Thu, Mar 19, 2020 at 08:47:20AM +0000, Julien Cappèle wrote:
> Though I agree with you Clemens that raw images are amazing to work
> with as you can use any software you are confortable with, we cannot
> forget that depositing several TB of data for each lab would be bad
> for ecological reason.

Of course, there are ecological (carbon footprint) considerations -
and there are lots of papers and studies about that. I haven't looked
at any numbers, but maybe some points:

 * A lot of data is already stored (e.g. at synchrotrons) and would
   "only" needed to be made "visible" via a DOI (caveat: I realise
   that there are huge technical issues with that)

 * How does that energy consumption compare with the energy used to
   perform the experiment in the first place?

 * If by having that data available we can improve software and the
   way experiments are done: wouldn't that potentially save energy in
   hte long run (avoiding poor or unnecessary experiments in the first
   place)?

 * We are looking at a move to increase the number of raw image data
   depositions for deposited PDB structures - not at a requirement to
   deposit raw images for every PDB structure or even for every
   dataset ever collected.

   At the moment there are about 4500 image datasets available for
   about 100000 PDB X-Ray structures, i.e. ~5%. 

> And because detectors are always improving (thank you all!), size of
> data will increase exponentially.

True ... and some type of experiment can benefit from those larger,
faster and more numerous types of datasets - if done correctly.

> Could it be possible for a new/already existing software to store
> reflections (area, intensity from center to border, position x/y on
> the image, and information of the image) in a lightweight and text
> only file ? Possibly a new format to be used for integration ?

See my other reply: this all assumes that the initial processing step
caught all spots (and nothing else) on the 2D image correctly.

There have been all kind of initiatives about raw data deposition (in
no particular order)

  https://www.iucr.org/resources/data/dddwg
  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331468/
  https://www.sciencedaily.com/releases/2016/11/161108130045.htm
  https://journals.iucr.org/d/issues/2016/11/00/yt5099/
  https://onlinelibrary.wiley.com/iucr/doi/10.1107/S0909049513020724
  http://scripts.iucr.org/cgi-bin/paper?S0907444908015540
  https://scripts.iucr.org/cgi-bin/paper?dz5309
  https://bl831.als.lbl.gov/~jamesh/lossy_compression/

So we've been there before. Let's see if we can't do at least
something for the clearly important structures and work right now -
and worry about some long-term impact later (having maybe learned
something along the way). Just because we could be doing something now
doesn't mean we will have to keep doing this in a 1-N years time,
right ;-)

Cheers

Clemens

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Reply via email to