The easiest approach is actually for us to delete all records before indexing 
the new version.
Just before you are ready for it to be reindexed, if you let us know it's a 
trivial thing to do and would be complete in around 1hr.

Thanks,
Tim


From: Roderic Page <Roderic.Page at 
glasgow.ac.uk<mailto:roderic.p...@glasgow.ac.uk>>
Date: Saturday 27 August 2016 at 08:18
To: Tim Robertson <trobertson at gbif.org<mailto:trobertson at gbif.org>>
Cc: "api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>" 
<api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>>
Subject: Re: [API-users] What happens to previous data after dataset/crawl?

Thanks Tim,

Specific example I'm working on is DNA barcoding data from BOLD. Their data 
dumps and web API differ in how they identify same record (basically whether 
they include the suffix '.COI-5P' or not) which is deeply annoying. So I may 
have a case where I need to update ids for large number of records, and want 
the other version of those records to be replaced. Sounds like I would need to 
ask you specifically to delete old ones if I want to this to happen.

Regards,

Rod

Get Outlook for iOS<https://aka.ms/o0ukef>




On Sat, Aug 27, 2016 at 7:12 AM +0100, "Tim Robertson" <trobertson at 
gbif.org<mailto:trobertson at gbif.org>> wrote:

Hi Rod

It is not done automatically due to the fact it normally happens due to some 
mapping error rather than by design.

Today we trigger it manually, but do want to automate it - probably only for 
cases it seems genuine.

Cheers,
Tim

On 27 Aug 2016, at 08:01, Roderic Page <Roderic.Page at 
glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>> wrote:

Just wanted to check the consequences of the following dataset operation.

Say I have a dataset with 10 occurrences with occurrence ids 1-10. In my local 
database I now assign those 10 occurrences new identifiers a-j. If I create a 
new DwCA file for my data and crawl the new archive, my expectation is:

1. Old data with ids 1-10 is deleted from GBIF index
2. New data with ids a-j is indexed

So, end result is dataset has 10 occurrences. I'm asking because I know in the 
past the some datasets have changed identifiers and this has resulted in 
records with old and new identifiers coexisting in GBIF index, resulting in 
duplicated data.

Obviously it would be nice to have stable, unchanging identifiers for 
occurrences, but the for data set I'm working with the creators have changed 
their minds between versions of the data :(

Regards,

Rod

Get Outlook for iOS<https://aka.ms/o0ukef>

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.gbif.org/pipermail/api-users/attachments/20160827/2422051a/attachment-0001.html>

Reply via email to