Re: Questions about toUnicode Cmap

Andreas Lehmkuehler Wed, 07 Mar 2012 23:21:38 -0800

Hi,

Am 07.03.2012 09:15, schrieb Leleu Eric:

Hi all,



I'm currently working on the preflight issue PDFBOX-1236 [1]

The error seems to come from the management of the "toUnicode" CMap in a
Type0 font.

The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to this
behaviour,
the preflight validator receives the unicode value for each character code
present in a Text operator instead of the CID value present in the Encoding
CMap.

Can you give me a pointer where in the preflight code that exactly happens.

So I have two questions :
- Is the "Encoding overriding" the right thing to do ?
- Why the "toUnicode" Cmap is used to display text? According to my
understanding of the PDF References v1.7, the toUnicode CMap is used to
extract Text from a PDF File and to create a text file with unicode
characters. To display the text on a PDFReader, the font content and the
Encoding Cmap seem enough.

PDFBox uses Graphics2d#drawString and newly java.awt.Font#createGlyphVector torender the text. The text as to be provided as unicode string when calling thosemethods.IMO we have to change that in the longrun. It would be better to create theglyphs using the font directly instead of converting it to an AWT-font.

What is your point of view about these two points?

Probably we can find a workaround for your issue, but I need some more detailson how the preflight code works (see above).

BR,
Eric

[1] https://issues.apache.org/jira/browse/PDFBOX-1236


BR
Andreas Lehmkühler

Re: Questions about toUnicode Cmap

Reply via email to