Dear Andreas,

Thanks for the fast reply!

--hai

On Tue, Jul 2, 2013 at 1:34 PM, Andreas Lehmkuehler <[email protected]>wrote:

> Hi,
>
>
> Am 02.07.2013 11:54, schrieb Hai Nguyen FUB:
>
>> Dear Andreas,
>>
>>
>> I have another question, for some documents, when converting them into
>> images, I received warnings like in the following:
>>
>> <snapshot>
>> ...
>> 11:42:14,895 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
>> default font
>> 11:42:14,900 WARN  [PDSimpleFont] Changing font on <u> from <Arial> to the
>> default font
>> 11:42:14,901 WARN  [PDSimpleFont] Changing font on <t> from <Arial> to the
>> default font
>> 11:42:14,901 WARN  [PDSimpleFont] Changing font on <s> from <Arial> to the
>> default font
>> 11:42:14,902 WARN  [PDSimpleFont] Changing font on <c> from <Arial> to the
>> default font
>> 11:42:14,903 WARN  [PDSimpleFont] Changing font on <h> from <Arial> to the
>> default font
>> 11:42:14,903 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
>> default font
>> 11:42:14,906 WARN  [PDSimpleFont] Changing font on <o> from <Arial> to the
>> default font
>> 11:42:14,907 WARN  [PDSimpleFont] Changing font on <r> from <Arial> to the
>> default font
>> ...
>> </snapshot>
>>
>> Those warning could be deactivated in the logging.property file, I guess.
>> Though, the images were still created, however the images display wrong
>> characters, please see the comparison in the attached image file.
>>
>> How can I solve this? I have look around in the documentation and googled
>> a
>> lot, but could not find any solutions.
>>
> This is a known behaviour of PDFBox. As the embedded font doesn't work for
> some
> reason an alternative font is used. In some cases it works but in most
> cases it
> doesn't. There is no solution, yet. Most likely the issue is related to
> PDFBOX-490 [1]
>
>
>  Is there a way to omit the character parsing, since my application is only
>> to convert the file to image and no ocr or the like? I have used the
>> loadNonSeq() method, but still received those poor characters in the
>> images.
>>
> No, it is needed to render the text and it has nothing to do with the
> parser
> itself.
>
>  thanks in advance!
>>
>> --hai
>>
>
> BR
> Andreas Lehmkühler
>
> [1] 
> https://issues.apache.org/**jira/browse/PDFBOX-490<https://issues.apache.org/jira/browse/PDFBOX-490>
>

Reply via email to