Hi,

Am 02.07.2013 11:54, schrieb Hai Nguyen FUB:
Dear Andreas,

I have another question, for some documents, when converting them into
images, I received warnings like in the following:

<snapshot>
...
11:42:14,895 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
default font
11:42:14,900 WARN  [PDSimpleFont] Changing font on <u> from <Arial> to the
default font
11:42:14,901 WARN  [PDSimpleFont] Changing font on <t> from <Arial> to the
default font
11:42:14,901 WARN  [PDSimpleFont] Changing font on <s> from <Arial> to the
default font
11:42:14,902 WARN  [PDSimpleFont] Changing font on <c> from <Arial> to the
default font
11:42:14,903 WARN  [PDSimpleFont] Changing font on <h> from <Arial> to the
default font
11:42:14,903 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
default font
11:42:14,906 WARN  [PDSimpleFont] Changing font on <o> from <Arial> to the
default font
11:42:14,907 WARN  [PDSimpleFont] Changing font on <r> from <Arial> to the
default font
...
</snapshot>

Those warning could be deactivated in the logging.property file, I guess.
Though, the images were still created, however the images display wrong
characters, please see the comparison in the attached image file.

How can I solve this? I have look around in the documentation and googled a
lot, but could not find any solutions.
This is a known behaviour of PDFBox. As the embedded font doesn't work for some
reason an alternative font is used. In some cases it works but in most cases it
doesn't. There is no solution, yet. Most likely the issue is related to
PDFBOX-490 [1]

Is there a way to omit the character parsing, since my application is only
to convert the file to image and no ocr or the like? I have used the
loadNonSeq() method, but still received those poor characters in the images.
No, it is needed to render the text and it has nothing to do with the parser
itself.

thanks in advance!

--hai

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-490

Reply via email to