[ 
https://issues.apache.org/jira/browse/PDFBOX-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876706#action_12876706
 ] 

Nicholas Blair commented on PDFBOX-606:
---------------------------------------

The suggestion of the source of the file is really good: the Mac OS X client is 
used by a number of our clients to upload content into our repository, so the 
possibility exists that we would encounter a malformed "PDF" in this fashion.

I put together a unit test in my environment with the file you uploaded, and I 
tried the same test against another file I generated on my own in the same 
fashion, but I do not encounter the infinite loop. I do encounter a different 
IOException:

java.io.IOException: Error: End-of-File, expected line
        at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1141)
        at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:294)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:162)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:814)
        
This same Exception and message occurs if I use the original 0.8.0 build I used 
when I originally posted the ticket and with the 1.1.0 release available for 
download today.


 

> infinite loop encountered in PushBackInputStream.read
> -----------------------------------------------------
>
>                 Key: PDFBOX-606
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-606
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>            Reporter: Nicholas Blair
>         Attachments: ._pellochmar10.pdf
>
>
> While processing customer content for Lucene index using PDFBox, encountered 
> an infinite loop in PDDocument.load, stack trace:
> java.io.FileInputStream.readBytes(Native Method)
> java.io.FileInputStream.read(FileInputStream.java:199)
> java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>    - locked java.io.bufferedinputstr...@f5ef5d
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>    - locked java.io.bufferedinputstr...@15b9c29
> java.io.FilterInputStream.read(FilterInputStream.java:66)
> java.io.PushbackInputStream.read(PushbackInputStream.java:122)
> org.apache.pdfbox.io.PushBackInputStream.read(PushBackInputStream.java:84)
> org.apache.pdfbox.pdfparser.BaseParser.skipSpaces(BaseParser.java:1190)
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:188)
> org.apache.pdfbox.pdfparser.PDFParser.parseTrailer(PDFParser.java:767)
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:456)
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841)
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808)
> edu.wisc.mywebspace.search.pdf.PdfDocumentContentParser.parse(PdfDocumentContentParser.java:47)
> Calling code looks like:
> document = PDDocument.load(inputStream);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to