Hi,
Am 07.03.2012 09:15, schrieb Leleu Eric:
Hi all,
I'm currently working on the preflight issue PDFBOX-1236 [1]
The error seems to come from the management of the "toUnicode" CMap in a
Type0 font.
The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to this
behaviour,
the preflight validator receives the unicode value for each character code
present in a Text operator instead of the CID value present in the Encoding
CMap.
Can you give me a pointer where in the preflight code that exactly happens.
So I have two questions :
- Is the "Encoding overriding" the right thing to do ?
- Why the "toUnicode" Cmap is used to display text? According to my
understanding of the PDF References v1.7, the toUnicode CMap is used to
extract Text from a PDF File and to create a text file with unicode
characters. To display the text on a PDFReader, the font content and the
Encoding Cmap seem enough.
PDFBox uses Graphics2d#drawString and newly java.awt.Font#createGlyphVector to
render the text. The text as to be provided as unicode string when calling those
methods.
IMO we have to change that in the longrun. It would be better to create the
glyphs using the font directly instead of converting it to an AWT-font.
What is your point of view about these two points?
Probably we can find a workaround for your issue, but I need some more details
on how the preflight code works (see above).
BR,
Eric
[1] https://issues.apache.org/jira/browse/PDFBOX-1236
BR
Andreas Lehmkühler