Hi, Am 02.07.2013 11:55, schrieb Hai Nguyen FUB:
Sorry, I have forgotten the attached image file
FTR: the attachment didn't make it due to some restriction to the mailing list.
thanks, --hai
BR Andreas Lehmkühler
On Tue, Jul 2, 2013 at 11:54 AM, Hai Nguyen FUB <[email protected] <mailto:[email protected]>> wrote: Dear Andreas, I have another question, for some documents, when converting them into images, I received warnings like in the following: <snapshot> ... 11:42:14,895 WARN [PDSimpleFont] Changing font on <e> from <Arial> to the default font 11:42:14,900 WARN [PDSimpleFont] Changing font on <u> from <Arial> to the default font 11:42:14,901 WARN [PDSimpleFont] Changing font on <t> from <Arial> to the default font 11:42:14,901 WARN [PDSimpleFont] Changing font on <s> from <Arial> to the default font 11:42:14,902 WARN [PDSimpleFont] Changing font on <c> from <Arial> to the default font 11:42:14,903 WARN [PDSimpleFont] Changing font on <h> from <Arial> to the default font 11:42:14,903 WARN [PDSimpleFont] Changing font on <e> from <Arial> to the default font 11:42:14,906 WARN [PDSimpleFont] Changing font on <o> from <Arial> to the default font 11:42:14,907 WARN [PDSimpleFont] Changing font on <r> from <Arial> to the default font ... </snapshot> Those warning could be deactivated in the logging.property file, I guess. Though, the images were still created, however the images display wrong characters, please see the comparison in the attached image file. How can I solve this? I have look around in the documentation and googled a lot, but could not find any solutions. Is there a way to omit the character parsing, since my application is only to convert the file to image and no ocr or the like? I have used the loadNonSeq() method, but still received those poor characters in the images. thanks in advance! --hai On Mon, Jul 1, 2013 at 6:57 PM, Hai Nguyen FUB <[email protected] <mailto:[email protected]>> wrote: alright, thank you very much for the fast reply!!! --hai On Mon, Jul 1, 2013 at 6:52 PM, Andreas Lehmkuehler <[email protected] <mailto:[email protected]>> wrote: Am 01.07.2013 18:30, schrieb Hai Nguyen FUB: Hi Andreas, thank you very much, it works!!! though I still have warning notifications as following: 18:26:54,687 WARN [NonSequentialPDFParser] PDF file 'src\test\resources\pdf\__249scan.pdf' does not allow extracting content. does this extracting means that the fonts or characters within the document are not extractable? It is possible to define user access permissions for a pdf, such as - disallow/allow printing - disallow/allow text extraction - disallow/allow modify the pdf - .... I your case, it is not allowed to extract the content of the pdf as text. thanks, --hai SNIP BR Andreas Lehmkühler
