I would say everybody keeps probably too many junk datasets around - at least I 
do. And I run into the trouble of having to buy new TB plates every now and 
then.
I think on average per year my group acquires currently ~700 GB of raw images 
(compressed), now if we were to only keep the useful datasets we probably would 
be down to 10% of that. But as always you hope for the best and keep some data 
considered junk in 2009 which might be useful in 2015.

Jürgen

On Apr 5, 2012, at 9:08 AM, Roger Rowlett wrote:


FYI, every NSF grant proposal now must have a data management plan that 
describes how all experimental data will be archived and in what formats. I'm 
not sure how seriously these plans are monitored, but a plan must be provided 
nevertheless. Is anyone NOT archiving their original data in some way?

Roger Rowlett

On Apr 5, 2012 7:16 AM, "John R Helliwell" 
<[email protected]<mailto:[email protected]>> wrote:
Dear '[email protected]<mailto:[email protected]>',

Re the pixel detector; yes this is an acknowledged raw data archiving
challenge; possible technical solutions include:- summing to make
coarser images ie in angular range, lossless compression (nicely
described on this CCP4bb by James Holton) or preserving a sufficient
sample of data....(but nb this debate is certainly not yet concluded).

Re "And all this hassle is for the only real purpose of preventing data fraud?"

Well.....Why publish data?
Please let me offer some reasons:
• To enhance the reproducibility of a scientific experiment
• To verify or support the validity of deductions from an experiment
• To safeguard against error
• To allow other scholars to conduct further research based on
experiments already conducted
• To allow reanalysis at a later date, especially to extract 'new'
science as new techniques are developed
• To provide example materials for teaching and learning
• To provide long-term preservation of experimental results and future
access to them
• To permit systematic collection for comparative studies
• And, yes, To better safeguard against fraud than is apparently the
case at present

Also to (probably) comply with your funding agency's grant conditions:-
Increasingly, funding agencies are requesting or requiring data
management policies (including provision for retention and access) to
be taken into account when awarding grants. See e.g. the Research
Councils UK Common Principles on Data Policy
(http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital
Curation Centre overview of funding policies in the UK
(http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies).
See also http://forums.iucr.org/viewtopic.php?f=21&t=58 for discussion
on policies relevant to crystallography in other countries. Nb these
policies extend over derived, processed and raw data, ie without
really an adequate clarity of policy from one to the other stages of
the 'data pyramid' ((see
http://www.stm-assoc.org/integration-of-data-and-publications).


And just to mention IUCr Journals Notes for Authors for biological
macromolecular structures, where we have our ie macromolecular
crystallography's version of the 'data pyramid' :-

(1) Derived data
• Atomic coordinates, anisotropic or isotropic displacement
parameters, space group information, secondary structure and
information about biological functionality must be deposited with the
Protein Data Bank before or in concert with article publication; the
article will link to the PDB deposition using the PDB reference code.
• Relevant experimental parameters, unit-cell dimensions are required
as an integral part of article submission and are published within the
article.

(2) Processed experimental data
• Structure factors must be deposited with the Protein Data Bank
before or in concert with article publication; the article will link
to the PDB deposition using the PDB reference code.

(3) Primary experimental data (here I give small and macromolecule
Notes for Authors details):-
For small-unit-cell crystal/molecular structures and macromolecular
structures IUCr journals have no current binding policy regarding
publication of diffraction images or similar raw data entities.
However, the journals welcome efforts made to preserve and provide
primary experimental data sets. Authors are encouraged to make
arrangements for the diffraction data images for their structure to be
archived and available on request.
For articles that present the results of powder diffraction profile
fitting or refinement (Rietveld) methods, the primary diffraction
data, i.e. the numerical intensity of each measured point on the
profile as a function of scattering angle, should be deposited.
Fibre data should contain appropriate information such as a photograph
of the data. As primary diffraction data cannot be satisfactorily
extracted from such figures, the basic digital diffraction data should
be deposited.


Finally to mention that many IUCr Commissions are interested in the
possibility of establishing community practices for the orderly
retention and referencing of raw data sets, and the IUCr would like to
see such data sets become part of the routine record of scientific
research in the future, to the extent that this proves feasible and
cost-effective.
I draw your attention therefore to the IUCr Forum on such matters at:-
http://forums.iucr.org/
Within this Forum you can find for example the ICSU convened Strategic
Coordinating Committee on Information and Data fairly recent report;
within this we learn of many other areas of science efforts on data
archiving and eg that the radio astronomy square kilometre array will
pose the biggest raw data archiving challenge on the planet.[Our needs
are thereby relatively modest.]

The IUCr Diffraction Data Deposition Working Group is actively
addressing all these various issues.
We weclome your input at the IUCr Forum, which will thereby be most
timely. Thankyou.

Best wishes,
Yours sincerely,
John
Professor John R Helliwell DSc


On Thu, Apr 5, 2012 at 1:24 AM, aaleshin 
<[email protected]<mailto:[email protected]>> wrote:
> People who raise their voices for a prolonged storage of raw images miss a
> simple fact that the volume of collected data increases proportionally if
> not faster than the cost of storage space drops. I just had an opportunity
> to collect data with the PILATUS detector at SSRL and say you that monster
> allows slicing the data 4-5 times thinner than other detectors do. Some
> people also like collecting very redundant data sets. Even now, transferring
> and storage of raw data from a synchrotron is a pain in the neck, but in a
> few years it may become simply impractical. And all this hassle is for the
> only real purpose of preventing data fraud? An't there a cheaper and more
> adequate solutions to the problem?
>
> I also wonder why after the first occurrence of data fraud several years
> ago, PDB did not take any action to prevent its appearance in the future? Or
> administrative actions are simply impossible nowadays without a mega-dollar
> grant?
>
>


--

......................
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry & Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:      +1-410-614-4894
Fax:      +1-410-955-2926
http://web.mac.com/bosch_lab/




Reply via email to