[
https://issues.apache.org/jira/browse/PDFBOX-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363568#comment-15363568
]
John Hewson edited comment on PDFBOX-3403 at 7/6/16 1:11 AM:
-------------------------------------------------------------
Unfortunately this fix was not a good choice. The exception which was thrown is
there to verify PDFBox's internal consistency (that's why it's an unchecked
exception), and bypassing it now allows PDFBox to get into an inconsistent
state with invariants such as "getBaseEncoding() never returns null" being
violated.
As the exception says, symbolic fonts *must* have a built-in encoding. It's the
job of the caller of this function to make sure the inputs are correct - that's
what needs to happen. Otherwise it's garbage in, garbage out.
The `readEncoding()` method of `PDSimpleFont` is responsible for reading, and
making fixes to encodings. This code should be making sure that the invariants
set by DictionaryEncoding are satisfied, rather than forcing through invalid
data.
The only change that appears to have been necessary here was to add support for
MacExpertEncoding. We could consider the issue of what happens when the
BaseEncoding is an invalid name - but it's hypothetical and we'd really need an
example PDF to work through the best solution.
was (Author: jahewson):
Unfortunately this fix was not a good choice. The exception which was thrown is
there to verify PDFBox's internal consistency (that's why it's an unchecked
exception), and bypassing it now allows PDFBox to get into an inconsistent
state with invariants such as "getBaseEncoding() never returns null" being
violated.
As the exception says, symbolic fonts *must* have a built-in encoding. It's the
job of the caller of this function to make sure the inputs are correct - that's
what needs to happen. Otherwise it's garbage in, garbage out.
The `readEncoding()` method of `PDSimpleFont` is responsible for reading, and
making fixes to encodings. This code should be making sure that the invariants
set by DictionaryEncoding are satisfied, rather than forcing through invalid
data.
The only change that appears to have been necessary here was to add support for
MacExpertEncoding.
> IllegalArgumentException: Symbolic fonts must have a built-in encoding
> ----------------------------------------------------------------------
>
> Key: PDFBOX-3403
> URL: https://issues.apache.org/jira/browse/PDFBOX-3403
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.2, 2.0.3, 2.1.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Fix For: 2.0.3, 2.1.0
>
> Attachments: PDFBOX-3403.pdf
>
>
> Happens with text extraction and rendering:
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: Symbolic fonts
> must have a built-in encoding
> at
> org.apache.pdfbox.pdmodel.font.encoding.DictionaryEncoding.<init>(DictionaryEncoding.java:113)
> at
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.readEncoding(PDSimpleFont.java:126)
> at
> org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:131)
> at
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:60)
> at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:123)
> at
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:829)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]