Dear Andreas,

I have another question, for some documents, when converting them into
images, I received warnings like in the following:

<snapshot>
...
11:42:14,895 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
default font
11:42:14,900 WARN  [PDSimpleFont] Changing font on <u> from <Arial> to the
default font
11:42:14,901 WARN  [PDSimpleFont] Changing font on <t> from <Arial> to the
default font
11:42:14,901 WARN  [PDSimpleFont] Changing font on <s> from <Arial> to the
default font
11:42:14,902 WARN  [PDSimpleFont] Changing font on <c> from <Arial> to the
default font
11:42:14,903 WARN  [PDSimpleFont] Changing font on <h> from <Arial> to the
default font
11:42:14,903 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
default font
11:42:14,906 WARN  [PDSimpleFont] Changing font on <o> from <Arial> to the
default font
11:42:14,907 WARN  [PDSimpleFont] Changing font on <r> from <Arial> to the
default font
...
</snapshot>

Those warning could be deactivated in the logging.property file, I guess.
Though, the images were still created, however the images display wrong
characters, please see the comparison in the attached image file.

How can I solve this? I have look around in the documentation and googled a
lot, but could not find any solutions.

Is there a way to omit the character parsing, since my application is only
to convert the file to image and no ocr or the like? I have used the
loadNonSeq() method, but still received those poor characters in the images.

thanks in advance!

--hai


On Mon, Jul 1, 2013 at 6:57 PM, Hai Nguyen FUB <[email protected]>wrote:

> alright, thank you very much for the fast reply!!!
>
> --hai
>
>
> On Mon, Jul 1, 2013 at 6:52 PM, Andreas Lehmkuehler <[email protected]>wrote:
>
>> Am 01.07.2013 18:30, schrieb Hai Nguyen FUB:
>>
>>  Hi Andreas,
>>>
>>> thank you very much, it works!!!
>>>
>>> though I still have warning notifications as following:
>>>
>>> 18:26:54,687 WARN  [NonSequentialPDFParser] PDF file
>>>
>>>> 'src\test\resources\pdf\**249scan.pdf' does not allow extracting
>>>> content.
>>>>
>>>>
>>> does this extracting means that the fonts or characters within the
>>> document
>>> are not extractable?
>>>
>> It is possible to define user access permissions for a pdf, such as
>>
>> - disallow/allow printing
>> - disallow/allow text extraction
>> - disallow/allow modify the pdf
>> - ....
>>
>> I your case, it is not allowed to extract the content of the pdf as text.
>>
>>  thanks,
>>>
>>> --hai
>>> SNIP
>>>
>>
>> BR
>> Andreas Lehmkühler
>>
>
>

Reply via email to