We had the same problems and finally replaced the use of PDFBox to filter .pdf files with XPDF2Text. Our only documents that won't filter now are truly corrupt files. Plus XPDF2Text is 10 times as fast as PDFBox. Take a look at this: http://jira.dspace.org/jira/browse/DS-183;jsessionid=3F4B680EC315609CF41443721BB9C6F6?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
Hope this helps! Sue Sue Walker-Thornton NASA Langley Research Center Integrated Library Systems Developer, Application & Database Administrator ConITS Contract NCI Information Systems, Inc. 130 Research Drive Hampton, VA 23666 Office: (757) 224-4074 Fax: (757) 224-4001 Mobile: (757) 506-9903 Email: susan.m.thorn...@nasa.gov ________________________________________ From: White, Andrew [andrew.wh...@lincoln.ac.nz] Sent: Friday, April 30, 2010 12:24 AM To: dspace-tech@lists.sourceforge.net Subject: [Dspace-tech] Media filtering problem - filtering was unsuccessful We are experiencing problems with media filtering of PDF files added in our thesis digitisation project. A number of the files (perhaps 10%) will not filter, the command window just pauses for up to 15 minutes or so, then displays: "SKIPPED: bitstream 5698 (item: 10182/1780) because filtering was unsuccessful" No other error message or clue is given. I can see no common feature of the PDFs that won't filter - they can be b&w only or some colour, different PDF versions. Yes, they are all quite large files (10MB or larger), but not all files of this size are failing in this way. I find that if I split the file into parts and re-upload, they will then filter OK. Has anyone else experienced this and do you have a solution? Andrew White Information Technology Librarian George Forbes Memorial Library PO Box 64 Lincoln University Lincoln 7647 Christchurch, New Zealand p +64 3 321 8542 | f +64 3 325 2944 e andrew.wh...@lincoln.ac.nz<mailto:andrew.wh...@lincoln.ac.nz> | w library.lincoln.ac.nz<http://library.lincoln.ac.nz/> Lincoln University, Te Whare Wanaka o Aoraki New Zealand's Specialist Land Based University "The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system." ------------------------------------------------------------------------------ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech