We had the same problems and finally replaced the use of PDFBox to filter .pdf 
files with XPDF2Text.  Our only documents that won't filter now are truly 
corrupt files.  Plus XPDF2Text is 10 times as fast as PDFBox.  Take a look at 
this:  
http://jira.dspace.org/jira/browse/DS-183;jsessionid=3F4B680EC315609CF41443721BB9C6F6?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

Hope this helps!
Sue

Sue Walker-Thornton
NASA Langley Research Center
Integrated Library Systems Developer, Application & Database Administrator
ConITS Contract
NCI Information Systems, Inc.
130 Research Drive
Hampton, VA  23666

Office: (757) 224-4074
Fax:    (757) 224-4001
Mobile: (757) 506-9903
Email:  susan.m.thorn...@nasa.gov
________________________________________
From: White, Andrew [andrew.wh...@lincoln.ac.nz]
Sent: Friday, April 30, 2010 12:24 AM
To: dspace-tech@lists.sourceforge.net
Subject: [Dspace-tech] Media filtering problem - filtering was unsuccessful

We are experiencing problems with media filtering of PDF files added in our 
thesis digitisation project.

A number of the files (perhaps 10%) will not filter, the command window just 
pauses for up to 15 minutes or so, then displays:

"SKIPPED: bitstream 5698 (item: 10182/1780) because filtering was unsuccessful"

No other error message or clue is given.

I can see no common feature of the PDFs that won't filter - they can be b&w 
only or some colour, different PDF versions. Yes, they are all quite large 
files (10MB or larger), but not all files of this size are failing in this way.

I find that if I split the file into parts and re-upload, they will then filter 
OK.

Has anyone else experienced this and do you have a solution?

Andrew White
Information Technology Librarian

George Forbes Memorial Library
PO Box 64
Lincoln University
Lincoln 7647
Christchurch, New Zealand

p +64 3 321 8542 | f +64 3 325 2944
e andrew.wh...@lincoln.ac.nz<mailto:andrew.wh...@lincoln.ac.nz> | w 
library.lincoln.ac.nz<http://library.lincoln.ac.nz/>

Lincoln University, Te Whare Wanaka o Aoraki
New Zealand's Specialist Land Based University




"The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use,
distribution, or copying of the contents is expressly prohibited.  If you have 
received this e-mail in error, please advise the sender
by return e-mail or telephone and then delete this e-mail together with all 
attachments from your system."

------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... White, Andrew
    • R... Debashree Pati
    • R... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to