Thanks to James Brunt for agreeing to share this. I'm sure that as
the reviewer and PI communities gain experience with this component
of proposals that expectations will develop.
David Inouye
How to Write a Data Management Plan for a National Science Foundation
(NSF) Proposal
LTER Cybersecurity and Data Management Briefing #2 - February 2011
by James Brunt
The National Science Foundation (NSF) has made good the announcement
in <http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928>last May's
press release to require a data management plan with every NSF
proposal. You will be happy to know that writing a data management
plan is not difficult. While constructing the text to meet the NSF
requirements does demand some attention to detail, the real challenge
is that the data management plan has to be non-fiction, describing
procedures that will actually take place. The NSF receives about
40,000 proposals each year (source: Wikipedia). It occurred to me to
wonder how those 40,000 potential investigators were going to
approach this new requirement. A quick scan of blogs by scientists
between now and last May when the intention was announced reveals
that much single-investigator science has no process or procedures in
place that could safely be called data management. The data life
cycle for these projects ends with the publication of results in a
peer-reviewed journal. The purpose of this briefing is to provide
you with a solid outline for a data management plan to include in
your NSF proposals and some resources that will help you on your way
to leveraging your valuable research products through preservation and reuse.
As of January 18, 2011, all proposals to NSF must include a
supplementary document of no more than two pages labeled "Data
Management Plan". This supplement should describe how the proposal
will conform to NSF policy on the dissemination and sharing of
research results
(see<http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4>AAG<http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4>
Chapter VI.D.4). The NSF policy includes the sharing of results,
primary data, physical samples and collections. This policy also
mentions that NSF will enforce this policy through a variety of
mechanisms and provide appropriate support and incentives for data
cleanup, documentation, dissemination, and storage. NSF suggests
that the plan "may" contain:
* the types of data, samples, physical collections, software,
curriculum materials, and other materials to be produced in the
course of the project;
* the standards to be used for data and metadata format and
content (where existing standards are absent or deemed inadequate,
this should be documented along with any proposed solutions or remedies);
* policies for access and sharing including provisions for
appropriate protection of privacy, confidentiality, security,
intellectual property, or other rights or requirements;
* policies and provisions for re-use, re-distribution, and the
production of derivatives; and
* plans for archiving data, samples, and other research products,
and for preservation of access to them.
NSF stops short of dictating what data management practices you
should engage in. This means if there are community standards they
will be applied through peer review pressure. While in some
communities this means you can probably get away with two sentences
saying how much you don't need a data management plan, that's not
true in the ecological community where there are standards of
practice and experienced informatics-oriented colleagues on the
review panels. Some NSF directorates and divisions have issued
advice to proposers that contain more specific suggestions (e.g.
<http://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf>SBE,
<http://www.nsf.gov/geo/ear/2010EAR_data_policy_9_28_10.pdf>EAR,
<http://www.nsf.gov/bfa/dias/policy/dmpdocs/phy.pdf>MPS). In
addition, institutions are beginning to post resources for their
constituents that can be of use in developing a data management plan
(e.g.,
<http://libraries.mit.edu/guides/subjects/data-management/>MIT,
<http://dataplan.wisc.edu/wp-content/uploads/2010/04/data_plan_guide.pdf>UWM).
If you are reading this first hand then you are in luck because you
are in some way associated with an LTER site. LTER proposals have
been going in with data management plans and backed up by data
management procedures for the last 30 years. This means that there
is expertise for you to draw on to prepare your plan and more
importantly resources to guide you down the road to fulfilling your
plan. (Note: It has been expressed by an NSF source that a PI
adopting their LTER site research data management plan for their
proposed projects to other NSF programs would be viewed
favorably.) If you've received this via a colleague or through the
magic of Google then I hope that I can give you some added confidence
in the composition of your data management plan.
The National Science Board in its 2005 recommendations to NSF,
<http://www.nsf.gov/pubs/2005/nsb0540/>NSB-05-40, Long-Lived Digital
Data Collections Enabling Research and Education in the 21st
Century, intended these data management plans to be quite
comprehensive. With this 2-page directive, however, NSF is
particularly interested in data management with regard to the
dissemination and sharing of research results. While the
instructions below reflect desirable data management
practices, there are several essential issues among them that
deserve more weight in your write-up for NSF. I will identify these
in the text below. As with LTER proposals, any specific solicitation
instructions trump this 2-pager in terms of expectations but must
still include the essential information below.
Step 0. Label the page - "Data Management Plan"
Step 1. Collection - Describe the data to be collected during the
proposed period of operation. These are the actual observations, not
the final derivative product. This can be prose if simple or a table
if more complex. Name the type of data (e.g., mass of seeds, counts
of inflorescences), the instrument or collection approach (e.g.,
visual count recorded on paper), and the sampling design (e.g.,
number of plots, replicates, frequency of collection). If actual
data are interpreted, note the interpretation (e.g., impedance
interpreted as soil moisture). If data volumes are significant
(e.g., >1Gb/day) indicate an estimate of the totals. Describe any
quality control measures that will be put in place as part of data collection.
Step 2. Processing - Describe the disposition of the raw data
post-collection. How will data be transmitted from field or
instrument to institution? How regularly, by whom, and where will
data be stored? How will the security of those data be ensured? A
previous article describes several rules of thumb for data security
(<http://intranet2.lternet.edu/content/protecting-your-digital-research-data-and-documents-cybersecurity-briefing-1-september-2010>LTER
Data Management and Cybersecurity Briefing #1).
Step 3. Analysis - Describe in general any descriptive or analytical
statistics that will be run against the data for quality assurance,
derivation, aggregation, etc. Mention the names of analytical
packages (e.g, SAS, SPSS, MatLab, R).
Step 4. Documentation - Documentation is required to ensure the
longevity of data. The documentation of your study is best done
during the process, not after. This step describes the accumulation
of the documentation text, while Step 8 describes the encoding of
this text into a metadata language for publication. Here you will
describe what metadata/documentation will be created at each stage of
the data life cycle and by whom. For example, "Changes made to the
data to correct errors will be described and revised during the data
manipulation process by the budgeted graduate student". Examples of
good metadata can be seen in the <http://metacat.lternet.edu>LTER
data catalog or consult with your Site Information Manager. What is
the metadata content standard you will use to document these
data? Most ecological metadata is based on recommendations contained
in
<http://www.esajournals.org/doi/abs/10.1890/1051-0761%281997%29007%5B0330%3ANMFTES%5D2.0.CO%3B2>Michener
et al. 1997.
Step 5. Products (Essential) Describe the data or other products that
you will be making available from the study. These may or may not be
the raw data described in step 1. This is another place where a table
might be useful.
Step 6. Policy (Essential) Describe the policies under which these
data will be made available (See
<http://intranet2.lternet.edu/documents/lter-network-data-access-policy-revision-3>LTER
Data Access Policy for example) and how you will deal with privacy or
other sensitive data issues (e.g., location of endangered species).
Step 7. Archival (Essential) Describe how and where you will make
these data and metadata available to the community in perpetuity.
Here again you have an advantage by being associated with an LTER
site. LTER sites maintain archival infrastructure for making data
and metadata accessible and can give you tips and maybe some direct
support. If not, most institutional libraries operate digital
repositories that will provide this service for their constituents.
Step 8. Curation (Essential) - Preparation of metadata and data for
publication is a time consuming process. This should be acknowledged
in the data management plan and in the budget. In this step you will
describe the structural standards that you will apply in making data
and metadata available. For example, for most ecological data,
documentation will need to be structured in Ecological Metadata
Language (EML) to be included in community repositories. There are
<http://intranet2.lternet.edu/documents/eml-best-practices-document-2004>best
practices available from the LTER community for EML. However, you
can avoid direct contact with EML and best practices documents by
registering your datasets online with the Knowledge Network for
Biocomplexity (See Step 9.)
Step 9. Publication (Essential) - After making sure you have a
secure place for your data products to reside, you need to register
them with community repositories. Include a description here of the
institutional repository(s) where you will register your data. Your
LTER site can register and publish your data. If that is not
appropriate for your study, the LTER Network operates as a node on
the <http://knb.ecoinformatics.org>Knowledge Network for
Biocomplexity (KNB) where these data can be independently
registered. KNB offers an online repository form and a guide for
completing the form. The NSF DataNet projects, in particular
<http://www.dataone.org>DataONE, will hopefully soon offer another
outlet for data publication.
For specific datasets you may consider formally publishing the
data. <http://esapubs.org/archive/default.htm>Ecological Archives is
a peer-reviewed data journal operated by the Ecological Society of
America that accepts well described datasets and their textual
description for publication. There are others operated in various
ways by scientific societies. Avoid only committing your data to
commercial journal repositories for what I hope are obvious reasons.
Other considerations:
The information contained in the plan regarding "plans for
preservation, documentation, and sharing of data" is also required to
be part of the Project Description - - so it seems that placement of
an appropriate reference to the 2-page plan in the project
description would be prudent.
Make sure your proposed budget addresses the data management plan.
Costs of documenting, preparing, publishing, disseminating and
sharing research findings and supporting material are allowable
charges against the grant.
Data management plans and procedures should become standardized for a
lab, institute, or even community such that in time there is
boilerplate material available that reflects institutionalized procedures.
Ultimately the success of any given plan will lie in the hands of the
reviewers and the makeup of the panel, but as with any new initiative
those 40,000 proposals that go in first tend to set the tone for the
future. Finally, just before going to press I read in a
<http://news.unm.edu/2011/02/online-data-management-planning-tool-tames-data-and-meets-researchers%E2%80%99-funding-requirements/>reliable
source that DataONE and others are developing a software tool that
will write data management plans for you. Until that time, I hope you
find this information useful.
Comments and discussion are encouraged and should be directed to the
<http://intranet2.lternet.edu/comment/reply/3248#comment-form>online
forum so that the community may benefit.
Copyright 2010-2011 James W Brunt