Tamir Hassan created PDFBOX-4115:
------------------------------------

             Summary: Problem creating PDF with German text using embedded 
Type1 (PFB) font
                 Key: PDFBOX-4115
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4115
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.8
            Reporter: Tamir Hassan
         Attachments: n019003l.pfb

Hi all,

When creating a PDF and adding text using a PostScript Type1 font (e.g. the 
attached n019003l.pfb but also others), an error occurs when the text contains 
German characters. After a reply from Tilman Hausherr on the users mailing 
list, I have been advised to submit this as an issue.

I am using the latest release (2.0.8).

The error occurs with e.g. the character "ä" (adieresis) and other similar 
umlaut characters; it does not occur with "ß" (germandbls).

Using an embedded TTF seems to work fine but when I load the PFB like this:

InputStream pfb = new FileInputStream(fontFile);
font = new PDType1Font(document, pfb);

I get an encoding error whenever I try to print an "ä" to the page:

java.lang.IllegalArgumentException: U+00E4 ('adieresis') is not available in 
this font NimbusSanL-Regu (generic: NimbusSanL-Regu) encoding: built-in (Type 1)

If I specify a different encoding (WinANSI) when loading the font:

InputStream pfb = new FileInputStream(fontFile);
font = new PDType1Font(document, pfb, new WinAnsiEncoding());

then the exception is not thrown, but I just have an empty space in place of 
the "ä".

I have tried to look into the code, in particular I have played around with the 
class PDType1FontEmbedder.

When the FontBox object Type1Font is created by the parser in the following 
line of code:

type1 = Type1Font.createWithPFB(pfbBytes);

I have tried to look into the charstring dictionary:

type1.getCharStringsDict()

and, by iterating through the set keys, can see that "adieresis" is in there.

However, when using the default encoding from the font (i.e. by passing "null" 
to the PDType1FontEmbedder), the resulting encoding that is obtained by the 
following line of code:

fontEncoding = Type1Encoding.fromFontBox(type1.getEncoding());

does not contain "adieresis" (or other "compound" characters), but just 
"dieresis"

Thanks,
Tamir



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to