Hi Andreas,

thank you very much, it works!!!

though I still have warning notifications as following:

18:26:54,687 WARN  [NonSequentialPDFParser] PDF file
> 'src\test\resources\pdf\249scan.pdf' does not allow extracting content.
>

does this extracting means that the fonts or characters within the document
are not extractable?

thanks,

--hai

On Mon, Jul 1, 2013 at 6:15 PM, Andreas Lehmkuehler <[email protected]>wrote:

> Hi,
>
> Am 01.07.2013 17:06, schrieb Hai Nguyen FUB:
>
>  Dear Pdfbox-developers,
>>
>> My name is Hai and I am a java developer at the Freie Universtät Berlin.
>>
>> I am currently working on a project, which deals with converting pdf files
>> to images. I have looked around and found the Pdfbox library to be a good
>> pdf handling tool.
>>
>> After awhile working with this tool, I got stucked on a problem: whenever
>> I
>> tried to convert a handwritten pdf file, which means those files are
>> handwritten documents and were scanned and exported to pdf files (I do not
>> have the original images files), I received the following errors:
>>
>> 16:54:08,965 ERROR [FlateFilter] FlateFilter: stop reading corrupt stream
>>
>>> due to a DataFormatException
>>>
>>
>>
>> could you give me a hint, how to solve it?
>>
> Without having a hand on a sample pdf I'm just guessing. Try the
> non-sequential
> parser by using loadNonSeq() instead of load() to load the pdf.
>
>  my code snapshot is in the following:
>>
>> PDDocument document = PDDocument.load(new
>>
>>> File("src/test/resources/pdf/**249scan.pdf"));
>>>
>>>
>>
>> @SuppressWarnings("unchecked")
>>
>>> List<PDPage> pages = document.getDocumentCatalog().**getAllPages();
>>>
>>> PDPage page = pages.get(0);
>>> BufferedImage bi = page.convertToImage();
>>> ImageIO.write(bi, "png", new File("src/test/resources/pdf/**test.png"));
>>>
>>
>>
>>
>> Thank you in advance & Best regards
>>
>> --
>> Hai Nguyen
>>
>> Freie Universität Berlin
>> FB Mathematik u. Informatik
>> AG Intelligente Systeme und
>> Robotik<http://inf.fu-berlin.**de/groups/ag-ki/index.html<http://inf.fu-berlin.de/groups/ag-ki/index.html>
>> >
>>
>> Arnimallee 7, Raum 111
>> D-14195 Berlin
>>
>> Tel-1: +49 / 30 838 75114 ( Arnimallee - FUB)
>> Tel-2: +49 / 30 838 75148 (Takustr - FUB)
>> Tel-3: +49 / 30 2093 6381 (Office upstaris - HUB)
>> Tel-4: +49 / 30 2093 6393 (Lab downstairs - HUB)
>> Fax:    +49 / 30 838 75059
>> __________________________
>>
>
> BR
> Andreas Lehmkühler
>

Reply via email to