Hi,
Quoted:
“It seems the withdrawn items are not even taken into account when you do
bin/dspace oai import -c (judging by the reported Total).”
- Yes, the withdrawn items are not retrieved during this process.
The following is my investigation.
1.
When the record is withdrawn, some fields of the correspondent row in “item”
are updated:
‘in_archive’ set to be false
‘withdrawn’ set to be true
‘last_modified’ updated to latest timestamp
2.
From Dspace source code:
/dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java
Locate the following line from function - private int indexAll() throws
DSpaceSolrIndexerException
String sqlQuery = "SELECT item_id FROM item WHERE in_archive=TRUE";
This query will retrieve all item_ids that will import to Solr OAI index.
This explain why the withdrawn items cannot be import to Solr index because
“in_archive” of the item has already set to be false during withdrawn process,
So the withdrawn items should be retrieved after changing the SQL query to
String sqlQuery = "SELECT item_id FROM item WHERE in_archive=TRUE OR
withdrawn=TRUE";
However, I have not checked comprehensively that the change will affect other
area.
-- Andrew Wong
Systems Librarian,
The Hong Kong University of Science and Technology Library
From: Ondřej Košarko [mailto:[email protected]]
Sent: Saturday, March 15, 2014 12:09 AM
To: [email protected]
Subject: Re: [Dspace-tech] OAI and withdrawn items
And one more idea regarding deletions and OAI caching.
When incremental harvesting is running against a repository using the "from"
and "until" parameters. What if the harvester gets cached result of item that
was deleted in meantime? Will the harvester ever find out the item was deleted?
Please correct me if I'm not grasping the selective harvesting correctly but
the cache seems like a bad idea when there are times involved in the requests.
Regards,
OK
2014-03-14 16:51 GMT+01:00 Ondřej Košarko
<[email protected]<mailto:[email protected]>>:
Hi,
it seems the OAI-PMH in DSpace 4.1 doesn't display deleted items correctly (or
at all).
The repository is set to persistently keep track of deleted records by default.
That means even the withdrawn items should appear in some of the listings
(identifiers/records). They should have header with status deleted.
I'm not able to obtain the deleted records even with GetRecord[1] (all I'm
seeing is '<error code="idDoesNotExist">The given id does not exist</error>').
It seems the withdrawn items are not even taken into account when you do
bin/dspace oai import -c (judging by the reported Total).
Best regards,
OK
[1]http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette