[jira] Updated: (PDFBOX-186) NullPointerException in getAllKids with corrupted pdf

Olivier Jaquemet (JIRA) Thu, 08 Apr 2010 01:04:01 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Olivier Jaquemet updated PDFBOX-186:
------------------------------------

    Attachment: PDF-corrupted.pdf

I submitted the original bug report on sourceforge back then.

You'll find attached to this issue the original corrupted PDF file, and here is 
the java code to reproduce the bug : 

{code}
  public static void testPDFBOX186() throws IOException {
    File corruptedFile = new File("PDF-corrupted.pdf");
    PDDocument pdfDocument = PDDocument.load(corruptedFile);
    StringWriter writer = new StringWriter();
    PDFTextStripper stripper = new PDFTextStripper();
    stripper.writeText(pdfDocument, writer);
  }
{code}


> NullPointerException in getAllKids with corrupted pdf
> -----------------------------------------------------
>
>                 Key: PDFBOX-186
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-186
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Priority: Minor
>         Attachments: PDF-corrupted.pdf, PwC-Tech-Forecast-Spring-2009.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1532246
> Originally submitted by ojaquemet on 2006-08-01 01:15.
> java.lang.NullPointerException
>  at
> org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
>  at
> org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
>  at
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
>  at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
>  at [...]
> Tested with PDFBox-0.7.2-log4j.jar and
> PDFBox-0.7.3-dev-20060731.jar
> Because the corrupted PDF is too big (7MB) to be
> attached here, you'll be able to find it there:
> http://olivier.jaquemet.free.fr/PDF-corrupted.pdf
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> I get this message too.  How do you parse big PDFs?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-186) NullPointerException in getAllKids with corrupted pdf

Reply via email to