Thanks very much, Tim!

I’ve checked permissions for item/bundles/bitstreams are all Anon READ. The 
metadata looks normal too including the Really Important fields like dc.type.

When I try index-discovery -i [itemid] I get "Unrecognized option: -i"

But the dSpace log from when we ran index-discovery -b shows:

2023-08-03 02:29:40,261 ERROR org.dspace.discovery.SolrServiceImpl @ Error 
while writing item to discovery index: 10182/16202 
message:org.apache.tika.exception.TikaException: Failed to parse an email 
message
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
org.apache.tika.exception.TikaException: Failed to parse an email message
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
     at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
     at 
org.dspace.discovery.SolrServiceImpl.writeDocument(SolrServiceImpl.java:748)
     at 
org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:1429)
     at 
org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:230)
     at 
org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:410)
     at 
org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
     at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at 
org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
     at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)

We found a Tika troubleshooting page at Troubleshooting Tika - TIKA - Apache 
Software 
Foundation<https://cwiki.apache.org/confluence/display/TIKA/Troubleshooting+Tika>
 so it looks like for some reason Tika thinks it’s supposed to be parsing an 
email message. This was utterly bewildering because the bitstream files are 
just regular PDFs: they have PDF file extensions, the format is marked as Adobe 
PDF in DSpace, and they open successfully as PDFs in the browser/Adobe Reader…

but then I looked at the text that had been extracted for the search index and 
found in each of the problem cases it begins eg:

Received: 22 June 2022 | Revised: 16 April 2023 | Accepted: 26 April 2023

This refers to when the journal first received the submitted article, but I 
guess Tika is interpreting the “Received:” as the start of an email header!

Fortunately we can see in our DSpace 7 dev environment this issue isn’t 
arising, so we’ll just ignore the issue until we can complete our upgrade.

Deborah


From: DSpace Technical Support <[email protected]>
Sent: Saturday, August 5, 2023 5:08 AM
To: DSpace Technical Support <[email protected]>
Subject: [dspace-tech] Re: Item not showing in search/browse

Caution: This email originated from outside our organisation. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi Deborah,

I'd recommend checking to see if there were any errors in indexing that item 
(e.g. in DSpace logs or Solr logs).  You also could try and trigger an index of 
*just that item* (`./dspace index-discovery -i [item-uuid]`) to see if that 
helps in any way, or perhaps gives a more specific errors.

Beyond that, if neither of those help, that'd imply to me that there must be 
some sort of permissions issue (or corrupt data? or missing/wrong metadata 
fields?) on the specific Item in question.  But it'd be hard to say for certain.

Tim
On Wednesday, August 2, 2023 at 5:30:18 PM UTC-5 
[email protected]<mailto:[email protected]> wrote:
Kia ora,

I’ve been doing some data tidying in DSpace 5.8 (xmlui) in preparation for an 
upcoming migration to 7.4 – mostly directly in the database. A few days later I 
was alerted to a record 
https://researcharchive.lincoln.ac.nz/handle/10182/16202 which isn’t showing up 
either by searching on the title, or in the title/author/keyword browse 
indexes. The item has Anonymous READ permissions (and anyway the search/browse 
still doesn’t work when I’m logged in as an Administrator) so I assumed this 
was because I’d been lazy and neglected to run a re-index.

So overnight we ran a job [dspace] /bin/dspace index-discovery -b expecting 
this would resolve the issue. But we’re still seeing the same problem.

Is there anything else that could be blocking it from being indexed?
Any other jobs we should run?
If I throw my hands up in despair and just go ahead with the migration, will 
that magically fix it?  (This is not actually my preference for various 
reasons, but some days a little magic would be nice!)

Deborah
––––––––––––––––––––––––––––––––––
Deborah Fitchett (she/her) MLIS, RLIANZA
Associate University Librarian, Digital Scholarship

––––––––––––––––––––––––––––––––––
Learning, Teaching and Library – Te Whare Pūrākau
PO Box 85064, Lincoln University
Lincoln 7647, Christchurch, New Zealand
+64 3 423 0358<tel:+64%203%20423%200358>
[email protected]
ltl.lincoln.ac.nz<http://ltl.lincoln.ac.nz/>

––––––––––––––––––––––––––––––––––
Lincoln University
Te Whare Wānaka o Aoraki
––––––––––––––––––––––––––––––––––

________________________________

"The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system."
--
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/a8cd5679-9b3b-40de-9012-c63fe5752842n%40googlegroups.com<https://groups.google.com/d/msgid/dspace-tech/a8cd5679-9b3b-40de-9012-c63fe5752842n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/ME3PR01MB75241FB55E4FC2B7CA3DD01AC50DA%40ME3PR01MB7524.ausprd01.prod.outlook.com.

Reply via email to