While it would be great to have versioned datasets I generally create a snapshot of the data used in a paper and archive this in Zenodo. This gives complete reproducibility without putting extra demands on the data providers. I do however need to cite the source and the snapshot. Regards Quentin
On Mon, 18 Feb 2019, 17:45 Tim Robertson <[email protected] wrote: > Hi Jonathan > > (adding GBIF helpdesk to the CC) > > > > This is just a quick answer which I expect will result in follow up > questions. > > > > In terms of citation, we use a DOI to identify the concept of a dataset, > not the specific version. E.g. https://doi.org/10.15468/cup0nk > > If you start deleting copies of data (e.g. a background housekeeping task) > what will break are links to the downloads in the IPT pages. > https://ipt.huh.harvard.edu/ipt/resource?r=huh_all_records&v=1.3 > > This may or may not be considered a problem for you. > > > > I think others might have contacted you about suggestions for improving > the dataset titles being used but if not I would suggest considering > correctly formatted titles as they are used in many places ( > https://www.gbif.org/dataset/4e4f97d2-4670-4b24-b982-261e0a450faf). > > > > I hope this helps as a start, > > Tim > > > > > > > > > > > > *From: *IPT <[email protected]> on behalf of "Kennedy, Jonathan" > <[email protected]> > *Date: *Monday, 18 February 2019 at 18.31 > *To: *"[email protected]" <[email protected]> > *Subject: *[IPT] Daily feeds and archive history > > > > Hi All, > > > > I am finishing an upgrade to the Harvard University Herbaria IPT instance > and have configured our feeds for daily auto-publish. The HUH has invested > in a mass digitization workflow and we are currently creating ~20,000 new > vascular records per month (with minimal data), so we do have new records > on a daily basis. However, our DwC archives are fairly large (100MB+), so > we can’t keep the daily archive history. I am looking for guidance on how > it will work with GBIF dataset citation if we do not preserve each daily > archive. It seems problematic if a version of our dataset is used and cited > but cannot be reconstructed. > > > > Best regards, > > Jonathan A. Kennedy > > Director of Biodiversity Informatics > > Harvard University Herbaria, > > Department of Organismic and Evolutionary Biology > _______________________________________________ > IPT mailing list > [email protected] > https://lists.gbif.org/mailman/listinfo/ipt >
_______________________________________________ IPT mailing list [email protected] https://lists.gbif.org/mailman/listinfo/ipt
