Hi,

Quoted:
“It seems the withdrawn items are not even taken into account when you do 
bin/dspace oai import -c (judging by the reported Total).”


-          Yes,  the withdrawn items are not retrieved during this process.  
The following is my investigation.

1.
When the record is withdrawn,  some fields of the correspondent row in “item” 
are updated:

‘in_archive’ set to be false
‘withdrawn’ set to be true
‘last_modified’ updated to latest timestamp


2.
From Dspace source code:
/dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java

Locate the following line from function - private int indexAll() throws 
DSpaceSolrIndexerException

String sqlQuery = "SELECT item_id FROM item WHERE in_archive=TRUE";

This query will retrieve all item_ids that will import to Solr OAI index.
This explain why the withdrawn items cannot be import to Solr index because 
“in_archive” of the item has already set to be false during withdrawn process,

So the withdrawn items should be retrieved after changing the SQL query to

String sqlQuery = "SELECT item_id FROM item WHERE in_archive=TRUE OR 
withdrawn=TRUE";

However, I have not checked comprehensively that the change will affect other 
area.

--  Andrew Wong
Systems Librarian,
The Hong Kong University of Science and Technology Library





From: Ondřej Košarko [mailto:[email protected]]
Sent: Saturday, March 15, 2014 12:09 AM
To: [email protected]
Subject: Re: [Dspace-tech] OAI and withdrawn items

And one more idea regarding deletions and OAI caching.

When incremental harvesting is running against a repository using the "from" 
and "until" parameters. What if the harvester gets cached result of item that 
was deleted in meantime? Will the harvester ever find out the item was deleted?

Please correct me if I'm not grasping the selective harvesting correctly but 
the cache seems like a bad idea when there are times involved in the requests.

Regards,
OK

2014-03-14 16:51 GMT+01:00 Ondřej Košarko 
<[email protected]<mailto:[email protected]>>:
Hi,
it seems the OAI-PMH in DSpace 4.1 doesn't display deleted items correctly (or 
at all).

The repository is set to persistently keep track of deleted records by default. 
That means even the withdrawn items should appear in some of the listings 
(identifiers/records). They should have header with status deleted.

I'm not able to obtain the deleted records even with GetRecord[1] (all I'm 
seeing is '<error code="idDoesNotExist">The given id does not exist</error>').

It seems the withdrawn items are not even taken into account when you do 
bin/dspace oai import -c (judging by the reported Total).

Best regards,
OK


[1]http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to