[
https://issues.apache.org/jira/browse/PDFBOX-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson resolved PDFBOX-1007.
---------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
This was fixed in 2.0 some time ago.
> Maven performs textual filtering of binary resources [patch]
> ------------------------------------------------------------
>
> Key: PDFBOX-1007
> URL: https://issues.apache.org/jira/browse/PDFBOX-1007
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.6.0
> Environment: Mac OS X 10.6.7, Java(TM) SE Runtime Environment (build
> 1.6.0_24-b07-334-10M3326), Apache Maven 3.0.2 (r1056850; 2011-01-09
> 01:58:10+0100)
> Reporter: Øyvind Berg
> Fix For: 2.0.0
>
>
> This applies to current svn, r1099514.
> This bit me when a lot of my files failed with the following stacktrace:
> Error while processing PDF:
> Caused by: java.io.IOException: head is mandatory
> at
> org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:107)
> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:61)
> at
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:90)
> at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
> at
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:66)
> at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
> at
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:204)
> at
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:188)
> at
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:114)
> at
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:116)
> at
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
> at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> at
> org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processPage(PDFBoxIntegration.java:797)
> at
> org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processDocument(PDFBoxIntegration.java:502)
> at
> org.elacin.pdfextract.datasource.pdfbox.PDFBoxSource.readPages(PDFBoxSource.java:74)
> ... 3 more
> The reason was that binary files (in this case resources/ttf/ArialMT.ttf)
> were subject to filtering so that unicode unknown character-symbols were
> inserted. Please consider fixing this by turning filtering off in
> trunk/pdfextract/pom.xml in the following way:
> <resource>
> <directory>src/main/resources</directory>
> <filtering>false</filtering>
> </resource>
--
This message was sent by Atlassian JIRA
(v6.2#6252)