Hi,

Am 02.07.2013 11:55, schrieb Hai Nguyen FUB:
Sorry, I have forgotten the attached image file
FTR: the attachment didn't make it due to some restriction to the mailing list.

thanks,

--hai


BR
Andreas Lehmkühler

On Tue, Jul 2, 2013 at 11:54 AM, Hai Nguyen FUB <[email protected]
<mailto:[email protected]>> wrote:

    Dear Andreas,

    I have another question, for some documents, when converting them into
    images, I received warnings like in the following:

    <snapshot>
    ...
    11:42:14,895 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
    default font
    11:42:14,900 WARN  [PDSimpleFont] Changing font on <u> from <Arial> to the
    default font
    11:42:14,901 WARN  [PDSimpleFont] Changing font on <t> from <Arial> to the
    default font
    11:42:14,901 WARN  [PDSimpleFont] Changing font on <s> from <Arial> to the
    default font
    11:42:14,902 WARN  [PDSimpleFont] Changing font on <c> from <Arial> to the
    default font
    11:42:14,903 WARN  [PDSimpleFont] Changing font on <h> from <Arial> to the
    default font
    11:42:14,903 WARN  [PDSimpleFont] Changing font on <e> from <Arial> to the
    default font
    11:42:14,906 WARN  [PDSimpleFont] Changing font on <o> from <Arial> to the
    default font
    11:42:14,907 WARN  [PDSimpleFont] Changing font on <r> from <Arial> to the
    default font
    ...
    </snapshot>

    Those warning could be deactivated in the logging.property file, I guess.
    Though, the images were still created, however the images display wrong
    characters, please see the comparison in the attached image file.

    How can I solve this? I have look around in the documentation and googled a
    lot, but could not find any solutions.

    Is there a way to omit the character parsing, since my application is only
    to convert the file to image and no ocr or the like? I have used the
    loadNonSeq() method, but still received those poor characters in the images.

    thanks in advance!

    --hai



    On Mon, Jul 1, 2013 at 6:57 PM, Hai Nguyen FUB <[email protected]
    <mailto:[email protected]>> wrote:

        alright, thank you very much for the fast reply!!!

        --hai


        On Mon, Jul 1, 2013 at 6:52 PM, Andreas Lehmkuehler <[email protected]
        <mailto:[email protected]>> wrote:

            Am 01.07.2013 18:30, schrieb Hai Nguyen FUB:

                Hi Andreas,

                thank you very much, it works!!!

                though I still have warning notifications as following:

                18:26:54,687 WARN  [NonSequentialPDFParser] PDF file

                    'src\test\resources\pdf\__249scan.pdf' does not allow
                    extracting content.


                does this extracting means that the fonts or characters within
                the document
                are not extractable?

            It is possible to define user access permissions for a pdf, such as

            - disallow/allow printing
            - disallow/allow text extraction
            - disallow/allow modify the pdf
            - ....

            I your case, it is not allowed to extract the content of the pdf as
            text.

                thanks,

                --hai
                SNIP


            BR
            Andreas Lehmkühler





Reply via email to