Tamir Hassan created PDFBOX-4115:
------------------------------------
Summary: Problem creating PDF with German text using embedded
Type1 (PFB) font
Key: PDFBOX-4115
URL: https://issues.apache.org/jira/browse/PDFBOX-4115
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 2.0.8
Reporter: Tamir Hassan
Attachments: n019003l.pfb
Hi all,
When creating a PDF and adding text using a PostScript Type1 font (e.g. the
attached n019003l.pfb but also others), an error occurs when the text contains
German characters. After a reply from Tilman Hausherr on the users mailing
list, I have been advised to submit this as an issue.
I am using the latest release (2.0.8).
The error occurs with e.g. the character "ä" (adieresis) and other similar
umlaut characters; it does not occur with "ß" (germandbls).
Using an embedded TTF seems to work fine but when I load the PFB like this:
InputStream pfb = new FileInputStream(fontFile);
font = new PDType1Font(document, pfb);
I get an encoding error whenever I try to print an "ä" to the page:
java.lang.IllegalArgumentException: U+00E4 ('adieresis') is not available in
this font NimbusSanL-Regu (generic: NimbusSanL-Regu) encoding: built-in (Type 1)
If I specify a different encoding (WinANSI) when loading the font:
InputStream pfb = new FileInputStream(fontFile);
font = new PDType1Font(document, pfb, new WinAnsiEncoding());
then the exception is not thrown, but I just have an empty space in place of
the "ä".
I have tried to look into the code, in particular I have played around with the
class PDType1FontEmbedder.
When the FontBox object Type1Font is created by the parser in the following
line of code:
type1 = Type1Font.createWithPFB(pfbBytes);
I have tried to look into the charstring dictionary:
type1.getCharStringsDict()
and, by iterating through the set keys, can see that "adieresis" is in there.
However, when using the default encoding from the font (i.e. by passing "null"
to the PDType1FontEmbedder), the resulting encoding that is obtained by the
following line of code:
fontEncoding = Type1Encoding.fromFontBox(type1.getEncoding());
does not contain "adieresis" (or other "compound" characters), but just
"dieresis"
Thanks,
Tamir
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]