Hi,
Am 08.03.2012 09:52, schrieb Leleu Eric:
Hi,
2012/3/8 Andreas Lehmkuehler<[email protected]>
Hi,
Am 07.03.2012 09:15, schrieb Leleu Eric:
Hi all,
I'm currently working on the preflight issue PDFBOX-1236 [1]
The error seems to come from the management of the "toUnicode" CMap in a
Type0 font.
The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to
this
behaviour,
the preflight validator receives the unicode value for each character code
present in a Text operator instead of the CID value present in the
Encoding
CMap.
Can you give me a pointer where in the preflight code that exactly happens.
You can find the Text validation in the
"org.apache.padaf.preflight.contentstream.ConstentStreamWrapper" class.
The method is validText(byte[] string).
We ask the character to the font.encode method to know how many bytes are
used to describe the CID.
When we have the CID, the checkCID on the
"org.apache.padaf.preflight.font.CFFType2FontContainer" is called and an
exception occurred when we search the GlyphId with this CID.
If I comment the initialization of the toUnicode map, I found the right
glyphs.
The first one is the 'W' glyph58 linked to the CID 1. (If I extract the
font and I read it with fontforge, the glyph 58 is the 'W' too)
I'll have a look at the weekend.
So I have two questions :
- Is the "Encoding overriding" the right thing to do ?
- Why the "toUnicode" Cmap is used to display text? According to my
understanding of the PDF References v1.7, the toUnicode CMap is used to
extract Text from a PDF File and to create a text file with unicode
characters. To display the text on a PDFReader, the font content and the
Encoding Cmap seem enough.
PDFBox uses Graphics2d#drawString and newly java.awt.Font#**createGlyphVector
to render the text. The text as to be provided as unicode string when
calling those methods.
IMO we have to change that in the longrun. It would be better to create
the glyphs using the font directly instead of converting it to an AWT-font.
I don't need to render the Text in the preflight component, I only check
that the glyph is present and I check the consistency of the width.
Bypass the AWT-Font will be great but it is a huge work.
Yes, but we need to do that, because some of the needed fonts aren't supported
or the support is buggy, see PDFBOX-490.
What is your point of view about these two points?
Probably we can find a workaround for your issue, but I need some more
details on how the preflight code works (see above).
BR,
Eric
[1]
https://issues.apache.org/**jira/browse/PDFBOX-1236<https://issues.apache.org/jira/browse/PDFBOX-1236>
BR
Andreas Lehmkühler
BR
Eric
BR
Andreas Lehmkühler