[
https://issues.apache.org/jira/browse/PDFBOX-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier Jaquemet updated PDFBOX-186:
------------------------------------
Attachment: PDF-corrupted.pdf
I submitted the original bug report on sourceforge back then.
You'll find attached to this issue the original corrupted PDF file, and here is
the java code to reproduce the bug :
{code}
public static void testPDFBOX186() throws IOException {
File corruptedFile = new File("PDF-corrupted.pdf");
PDDocument pdfDocument = PDDocument.load(corruptedFile);
StringWriter writer = new StringWriter();
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(pdfDocument, writer);
}
{code}
> NullPointerException in getAllKids with corrupted pdf
> -----------------------------------------------------
>
> Key: PDFBOX-186
> URL: https://issues.apache.org/jira/browse/PDFBOX-186
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Priority: Minor
> Attachments: PDF-corrupted.pdf, PwC-Tech-Forecast-Spring-2009.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1532246
> Originally submitted by ojaquemet on 2006-08-01 01:15.
> java.lang.NullPointerException
> at
> org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
> at
> org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
> at
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
> at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
> at [...]
> Tested with PDFBox-0.7.2-log4j.jar and
> PDFBox-0.7.3-dev-20060731.jar
> Because the corrupted PDF is too big (7MB) to be
> attached here, you'll be able to find it there:
> http://olivier.jaquemet.free.fr/PDF-corrupted.pdf
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO
> I get this message too. How do you parse big PDFs?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.