Hi Andreas, thank you very much, it works!!!
though I still have warning notifications as following: 18:26:54,687 WARN [NonSequentialPDFParser] PDF file > 'src\test\resources\pdf\249scan.pdf' does not allow extracting content. > does this extracting means that the fonts or characters within the document are not extractable? thanks, --hai On Mon, Jul 1, 2013 at 6:15 PM, Andreas Lehmkuehler <[email protected]>wrote: > Hi, > > Am 01.07.2013 17:06, schrieb Hai Nguyen FUB: > > Dear Pdfbox-developers, >> >> My name is Hai and I am a java developer at the Freie Universtät Berlin. >> >> I am currently working on a project, which deals with converting pdf files >> to images. I have looked around and found the Pdfbox library to be a good >> pdf handling tool. >> >> After awhile working with this tool, I got stucked on a problem: whenever >> I >> tried to convert a handwritten pdf file, which means those files are >> handwritten documents and were scanned and exported to pdf files (I do not >> have the original images files), I received the following errors: >> >> 16:54:08,965 ERROR [FlateFilter] FlateFilter: stop reading corrupt stream >> >>> due to a DataFormatException >>> >> >> >> could you give me a hint, how to solve it? >> > Without having a hand on a sample pdf I'm just guessing. Try the > non-sequential > parser by using loadNonSeq() instead of load() to load the pdf. > > my code snapshot is in the following: >> >> PDDocument document = PDDocument.load(new >> >>> File("src/test/resources/pdf/**249scan.pdf")); >>> >>> >> >> @SuppressWarnings("unchecked") >> >>> List<PDPage> pages = document.getDocumentCatalog().**getAllPages(); >>> >>> PDPage page = pages.get(0); >>> BufferedImage bi = page.convertToImage(); >>> ImageIO.write(bi, "png", new File("src/test/resources/pdf/**test.png")); >>> >> >> >> >> Thank you in advance & Best regards >> >> -- >> Hai Nguyen >> >> Freie Universität Berlin >> FB Mathematik u. Informatik >> AG Intelligente Systeme und >> Robotik<http://inf.fu-berlin.**de/groups/ag-ki/index.html<http://inf.fu-berlin.de/groups/ag-ki/index.html> >> > >> >> Arnimallee 7, Raum 111 >> D-14195 Berlin >> >> Tel-1: +49 / 30 838 75114 ( Arnimallee - FUB) >> Tel-2: +49 / 30 838 75148 (Takustr - FUB) >> Tel-3: +49 / 30 2093 6381 (Office upstaris - HUB) >> Tel-4: +49 / 30 2093 6393 (Lab downstairs - HUB) >> Fax: +49 / 30 838 75059 >> __________________________ >> > > BR > Andreas Lehmkühler >
