If you switch from using PDFBox to XPDF, most if not all of these errors will 
disappear.  As a bonus, your filter-media will run much, much faster too!
Google "DSpace AND installing XPDF", you'll find a bunch of articles on how to 
do this.
Best of luck,
Sue


Sue Walker-Thornton
(w):  (757) 864-2368
(m):  (757) 506-9903

From: Brett Arno [mailto:[email protected]]
Sent: Wednesday, April 25, 2012 5:15 PM
To: [email protected]
Subject: [Dspace-tech] Media Filter Errors

Hello All,

I'm receiving a good portion of errors when running the filter-media command 
and wondering if anyone can provide some insight.

I'm running 1.7.2 XMLUI with Mirage on a Red Hat server. Most items in the 
instance give this error:

ERROR filtering, skipping bitstream:

    Item Handle: 10829/669
    Bundle Name: ORIGINAL
    File Size: 43066
    Checksum: d302cf0378a385ff16610d63943b5368 (MD5)
    Asset Store: 0
java.io.IOException: No such file or directory
java.io.IOException: No such file or directory
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.createNewFile(File.java:900)
    at edu.sdsc.grid.io.local.LocalFile.createNewFile(LocalFile.java:486)
    at 
org.dspace.storage.bitstore.BitstreamStorageManager.store(BitstreamStorageManager.java:300)
    at org.dspace.content.Bitstream.create(Bitstream.java:205)
    at org.dspace.content.Bundle.createBitstream(Bundle.java:384)
    at 
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:760)
    at 
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
    at 
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
    at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
    at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414)
    at 
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)

The first thing I checked was that the items files exist and can be accessed 
through the system and from my large test sample, they all had PDFs and all 
opened fine.

I've tried completely rebuilding the index to see if that may help, but that 
didn't change the results. I've tried indexing with these commands and none of 
them helped:

[dspace]/bin/dpsace index-update
[dspace]/bin/dspace index-init
[dspace]/bin/dspace index-init -r -f

We are using the PDF filter that was issued with the system and not using 
discovery.

I also noticed this error in the DSpace log:
WARN  org.apache.pdfbox.util.PDFStreamEngine @ java.io.IOException: Error: 
expected hex character and not  :32
java.io.IOException: Error: expected hex character and not  :32
    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
    at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
    at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
    at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
    at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
    at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
    at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
    at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
    at 
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
    at 
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
    at 
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
    at 
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
    at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
    at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414)
    at 
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)


Any help would be greatly appreciated!

--
Brett Arno
Library Systems Support Specialist
Herrick Memorial Library
Alfred University
1 Saxon Drive
Alfred, NY 14802
Email: [email protected]<mailto:[email protected]> | Phone: 607-871-2989
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to