Hi Nason

We discovered on our repository [1] that a search for K4D lists items with bad characters displayed for the item records found. I first thought it could be that some records are bad or have no abstracts but this is not the case as I tried the same search on the University of Cambridge repository [2] and the same issue is there.

I guess that you are seeing not items“ abstracts (metadata) but paragrahs of the full-text extracted from PDFs and indexed by the media filter. Probably the PDFs has some encoding (non ISO-8859-1 encoding) that PDFbox, the PDF media filter used in version 6, cannot deal with. Updating to PDFbox V2 could help. (Dspace v6 must support it, perhaps can be ported back to your 5.5 version)

Some related (may be identical) problems have been reported, see https://jira.duraspace.org/browse/DS-2224 and https://jira.duraspace.org/browse/DS-3035

best luck

Emilio


Has anyone come across this the problem and if how was it resolved?

[1] https://opendocs.ids.ac.uk/opendocs/discover?rpp=10&etal=0&query=k4d&group_by=none&page=2 [2] https://www.repository.cam.ac.uk/discover?rpp=10&etal=0&query=k4d&group_by=none&page=2

Regards
Nason
--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To post to this group, send email to [email protected] <mailto:[email protected]>.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace 
Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to