Hi Jonathan
(adding GBIF helpdesk to the CC)

This is just a quick answer which I expect will result in follow up questions.

In terms of citation, we use a DOI to identify the concept of a dataset, not 
the specific version. E.g. https://doi.org/10.15468/cup0nk
If you start deleting copies of data (e.g. a background housekeeping task) what 
will break are links to the downloads in the IPT pages.  
https://ipt.huh.harvard.edu/ipt/resource?r=huh_all_records&v=1.3
This may or may not be considered a problem for you.

I think others might have contacted you about suggestions for improving the 
dataset titles being used but if not I would suggest considering correctly 
formatted titles as they are used in  many places 
(https://www.gbif.org/dataset/4e4f97d2-4670-4b24-b982-261e0a450faf).

I hope this helps as a start,
Tim





From: IPT <[email protected]> on behalf of "Kennedy, Jonathan" 
<[email protected]>
Date: Monday, 18 February 2019 at 18.31
To: "[email protected]" <[email protected]>
Subject: [IPT] Daily feeds and archive history

Hi All,

I am finishing an upgrade to the Harvard University Herbaria IPT instance and 
have configured our feeds for daily auto-publish. The HUH has invested in a 
mass digitization workflow and we are currently creating ~20,000 new vascular 
records per month (with minimal data), so we do have new records on a daily 
basis. However, our DwC archives are fairly large (100MB+), so we can’t keep 
the daily archive history. I am looking for guidance on how it will work with 
GBIF dataset citation if we do not preserve each daily archive. It seems 
problematic if a version of our dataset is used and cited but cannot be 
reconstructed.

Best regards,
Jonathan A. Kennedy
Director of Biodiversity Informatics
Harvard University Herbaria,
Department of Organismic and Evolutionary Biology
_______________________________________________
IPT mailing list
[email protected]
https://lists.gbif.org/mailman/listinfo/ipt

Reply via email to