Dear Andreas, I have another question, for some documents, when converting them into images, I received warnings like in the following:
<snapshot> ... 11:42:14,895 WARN [PDSimpleFont] Changing font on <e> from <Arial> to the default font 11:42:14,900 WARN [PDSimpleFont] Changing font on <u> from <Arial> to the default font 11:42:14,901 WARN [PDSimpleFont] Changing font on <t> from <Arial> to the default font 11:42:14,901 WARN [PDSimpleFont] Changing font on <s> from <Arial> to the default font 11:42:14,902 WARN [PDSimpleFont] Changing font on <c> from <Arial> to the default font 11:42:14,903 WARN [PDSimpleFont] Changing font on <h> from <Arial> to the default font 11:42:14,903 WARN [PDSimpleFont] Changing font on <e> from <Arial> to the default font 11:42:14,906 WARN [PDSimpleFont] Changing font on <o> from <Arial> to the default font 11:42:14,907 WARN [PDSimpleFont] Changing font on <r> from <Arial> to the default font ... </snapshot> Those warning could be deactivated in the logging.property file, I guess. Though, the images were still created, however the images display wrong characters, please see the comparison in the attached image file. How can I solve this? I have look around in the documentation and googled a lot, but could not find any solutions. Is there a way to omit the character parsing, since my application is only to convert the file to image and no ocr or the like? I have used the loadNonSeq() method, but still received those poor characters in the images. thanks in advance! --hai On Mon, Jul 1, 2013 at 6:57 PM, Hai Nguyen FUB <[email protected]>wrote: > alright, thank you very much for the fast reply!!! > > --hai > > > On Mon, Jul 1, 2013 at 6:52 PM, Andreas Lehmkuehler <[email protected]>wrote: > >> Am 01.07.2013 18:30, schrieb Hai Nguyen FUB: >> >> Hi Andreas, >>> >>> thank you very much, it works!!! >>> >>> though I still have warning notifications as following: >>> >>> 18:26:54,687 WARN [NonSequentialPDFParser] PDF file >>> >>>> 'src\test\resources\pdf\**249scan.pdf' does not allow extracting >>>> content. >>>> >>>> >>> does this extracting means that the fonts or characters within the >>> document >>> are not extractable? >>> >> It is possible to define user access permissions for a pdf, such as >> >> - disallow/allow printing >> - disallow/allow text extraction >> - disallow/allow modify the pdf >> - .... >> >> I your case, it is not allowed to extract the content of the pdf as text. >> >> thanks, >>> >>> --hai >>> SNIP >>> >> >> BR >> Andreas Lehmkühler >> > >
