Re: [Dspace-tech] Dspace Error

helix84 Thu, 30 Jan 2014 01:12:32 -0800

On Thu, Jan 30, 2014 at 9:07 AM, Bodnar Robert <[email protected]> wrote:
> I have again a problem with an error, could you help me pls figure out
> what is the problem with the software?


Hi Robert,

the problem is not with the software - not with DSpace. DSpace uses a
library called Apache PDFBox to extract the text from PDFs. This
library can't extract the text from this particular file, most
commonly this is due to the PDF either being damaged or in a format it
can't work with (PDF is a container that can contain various types of
content). Perhaps this can shed light on this particular error, even
though it might not help you resolve it:

http://forum.openkm.com/viewtopic.php?f=3&t=8187

If you need (e.g. you have many files in this format), you might try
asking on the PDFBox mailing list why this happens and how to work
around it (that will almost surely involve changing the process of how
you generate the PDF).

Anyway, if PDFBox reports an error for a particular PDF, DSpace skips
indexing it and continues with the next file.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Dspace Error

Reply via email to