On Thu, Jan 30, 2014 at 9:07 AM, Bodnar Robert <[email protected]> wrote: > I have again a problem with an error, could you help me pls figure out > what is the problem with the software?
Hi Robert, the problem is not with the software - not with DSpace. DSpace uses a library called Apache PDFBox to extract the text from PDFs. This library can't extract the text from this particular file, most commonly this is due to the PDF either being damaged or in a format it can't work with (PDF is a container that can contain various types of content). Perhaps this can shed light on this particular error, even though it might not help you resolve it: http://forum.openkm.com/viewtopic.php?f=3&t=8187 If you need (e.g. you have many files in this format), you might try asking on the PDFBox mailing list why this happens and how to work around it (that will almost surely involve changing the process of how you generate the PDF). Anyway, if PDFBox reports an error for a particular PDF, DSpace skips indexing it and continues with the next file. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

