[
https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040453#comment-14040453
]
Tilman Hausherr commented on PDFBOX-2149:
-----------------------------------------
The cheese file (attached by Petr) works without NPE now, but there's a new
exception for the file of PDFBOX-2059:
Jun 23, 2014 8:02:49 AM org.apache.pdfbox.pdmodel.font.PDFont getSpaceWidth
Schwerwiegend: Can't determine the width of the space character, assuming 250
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.codePointAt(Unknown Source)
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:325)
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150)
at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814)
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueType
Font.java:379)
at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312)
at org.apache.pdfbox.pdmodel.font.PDFont.getSpaceWidth(PDFont.java:855)
at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:317)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44)
...
> Font Refactoring
> ----------------
>
> Key: PDFBOX-2149
> URL: https://issues.apache.org/jira/browse/PDFBOX-2149
> Project: PDFBox
> Issue Type: Improvement
> Components: FontBox, PDModel
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Attachments: 000039.pdf, 000467.pdf
>
>
> To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need
> to sort out long-standing font/text encoding issues. The main issue is that
> encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses,
> sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this
> code is copy & pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and
> Encodings despite the fact that these two encoding methods are mutually
> exclusive. The end result is that the process of reading Encodings/CMaps is
> often following rules which are completely invalid for that font type but
> mostly work by luck.
> Phase 1
> - Refactor PDFont subclasses to remove setXXX methods which allow the object
> to be corrupted. Proper use of inheritance can remove all cases where public
> setXXX methods are used during font loading.
> - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF
> embedding, FontBox's TrueTypeFont class is externally mutable via setXXX
> methods used only by TTFParser: these can be made package-private.
> - the Encoding class and EncodingManager could do with some cleaning up prior
> to further refactoring.
> - PDSimpleFont does not do anything, its functionality should be moved into
> its superclass, PDFont.
> - PDFont#determineEncoding() loads CMaps when only Encodings are applicable,
> and vice versa. Loading needs to be pushed down into the appropriate
> subclasses, as a starting point the relevant code should at least be copied
> into the relevant subclasses ready for further refactoring.
> - TTFGlyph2D does its own decoding of char codes, rather than using the
> font's #encode method (fair enough because #encode is broken) and there's a
> copy and pasted version of the same code in PDTrueTypeFont - we need to
> consolidate this code into PDTrueTypeFont where it belongs.
> Phase 2
> - Refactor loading of CMaps and Encodings from font dictionaries, this will
> involve changes to PDFont and its subclasses to delegate loading to
> subclasses where it can be properly encapsulated
> - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as
> CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its
> CMap. We'll see.
> Phase 3
> - Refactor the decoding of character codes by PDFont and its subclasses, this
> will involve replacing the #getCodeFromArray, #encode and #encodeToCID
> methods.
> - Fix decoding of content stream character codes in PDFStreamEngine, using
> the newly refactored PDFont and using the current font's CMap to determine
> the code width.
> Phase 4
> - Add support for generating embedded TTFs with Unicode
--
This message was sent by Atlassian JIRA
(v6.2#6252)