Hi,
Am 01.07.2013 17:06, schrieb Hai Nguyen FUB:
Dear Pdfbox-developers,
My name is Hai and I am a java developer at the Freie Universtät Berlin.
I am currently working on a project, which deals with converting pdf files
to images. I have looked around and found the Pdfbox library to be a good
pdf handling tool.
After awhile working with this tool, I got stucked on a problem: whenever I
tried to convert a handwritten pdf file, which means those files are
handwritten documents and were scanned and exported to pdf files (I do not
have the original images files), I received the following errors:
16:54:08,965 ERROR [FlateFilter] FlateFilter: stop reading corrupt stream
due to a DataFormatException
could you give me a hint, how to solve it?
Without having a hand on a sample pdf I'm just guessing. Try the non-sequential
parser by using loadNonSeq() instead of load() to load the pdf.
my code snapshot is in the following:
PDDocument document = PDDocument.load(new
File("src/test/resources/pdf/249scan.pdf"));
@SuppressWarnings("unchecked")
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
PDPage page = pages.get(0);
BufferedImage bi = page.convertToImage();
ImageIO.write(bi, "png", new File("src/test/resources/pdf/test.png"));
Thank you in advance & Best regards
--
Hai Nguyen
Freie Universität Berlin
FB Mathematik u. Informatik
AG Intelligente Systeme und
Robotik<http://inf.fu-berlin.de/groups/ag-ki/index.html>
Arnimallee 7, Raum 111
D-14195 Berlin
Tel-1: +49 / 30 838 75114 ( Arnimallee - FUB)
Tel-2: +49 / 30 838 75148 (Takustr - FUB)
Tel-3: +49 / 30 2093 6381 (Office upstaris - HUB)
Tel-4: +49 / 30 2093 6393 (Lab downstairs - HUB)
Fax: +49 / 30 838 75059
__________________________
BR
Andreas Lehmkühler