[
https://issues.apache.org/jira/browse/PDFBOX-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846118#comment-13846118
]
Tilman Hausherr commented on PDFBOX-1769:
-----------------------------------------
Regression, please reopen, I get this when running the file cloud.pdf of
PDFBOX-869, and I don't get it when rolling back NonSequentialPDFParser.java to
the state before the three changes:
12.12.2013 08:10:27.558 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1835 - Can't find the object
xref at offset 77303
12.12.2013 08:10:27.602 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 3 0 obj
12.12.2013 08:10:27.603 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-112 for object 6 0 obj
12.12.2013 08:10:27.603 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-110 for object 8 0 obj
12.12.2013 08:10:27.603 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-110 for object 10 0 obj
12.12.2013 08:10:27.604 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 11 0 obj
12.12.2013 08:10:27.604 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 12 0 obj
12.12.2013 08:10:27.604 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 13 0 obj
12.12.2013 08:10:27.604 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 14 0 obj
12.12.2013 08:10:27.605 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 15 0 obj
12.12.2013 08:10:27.605 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 17 0 obj
12.12.2013 08:10:27.605 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-111 for object 16 0 obj
12.12.2013 08:10:27.606 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-110 for object 19 0 obj
12.12.2013 08:10:27.606 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 21 0 obj
12.12.2013 08:10:27.606 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 20 0 obj
12.12.2013 08:10:27.607 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-109 for object 38 0 obj
12.12.2013 08:10:27.611 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-109 for object 39 0 obj
12.12.2013 08:10:27.611 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-109 for object 36 0 obj
12.12.2013 08:10:27.611 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-109 for object 37 0 obj
12.12.2013 08:10:27.612 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 51 0 obj
12.12.2013 08:10:27.612 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 50 0 obj
12.12.2013 08:10:27.612 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 49 0 obj
12.12.2013 08:10:27.613 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 55 0 obj
12.12.2013 08:10:27.613 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 54 0 obj
12.12.2013 08:10:27.613 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 53 0 obj
12.12.2013 08:10:27.614 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 52 0 obj
12.12.2013 08:10:27.614 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 58 0 obj
12.12.2013 08:10:27.614 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 57 0 obj
12.12.2013 08:10:27.614 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 56 0 obj
12.12.2013 08:10:27.615 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 62 0 obj
12.12.2013 08:10:27.615 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 61 0 obj
12.12.2013 08:10:27.615 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 68 0 obj
12.12.2013 08:10:27.616 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 69 0 obj
12.12.2013 08:10:27.616 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 70 0 obj
12.12.2013 08:10:27.616 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 71 0 obj
12.12.2013 08:10:27.616 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 65 0 obj
12.12.2013 08:10:27.617 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 66 0 obj
12.12.2013 08:10:27.617 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 67 0 obj
12.12.2013 08:10:27.617 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 76 0 obj
12.12.2013 08:10:27.618 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 78 0 obj
12.12.2013 08:10:27.618 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 72 0 obj
12.12.2013 08:10:27.618 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 73 0 obj
12.12.2013 08:10:27.618 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 74 0 obj
12.12.2013 08:10:27.619 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-108 for object 75 0 obj
12.12.2013 08:10:27.619 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 80 0 obj
12.12.2013 08:10:27.619 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-107 for object 83 0 obj
12.12.2013 08:10:27.620 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-112 for object 89 0 obj
12.12.2013 08:10:27.620 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-112 for object 90 0 obj
12.12.2013 08:10:27.621 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-113 for object 103 0 obj
12.12.2013 08:10:27.621 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1774 - Invalid object offset
-113 for object 104 0 obj
12.12.2013 08:10:27.719 ERROR [main]
org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1587 - The end of the stream
doesn't point to the correct offset, using workaround to read the stream
java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
at
org.apache.pdfbox.io.PushBackInputStream.unread(PushBackInputStream.java:147)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.readUntilEndStream(NonSequentialPDFParser.java:1689)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1551)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1233)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1159)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1133)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:470)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:731)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1139)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1122)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:139)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:79)
> Fix crash on invalid xref
> -------------------------
>
> Key: PDFBOX-1769
> URL: https://issues.apache.org/jira/browse/PDFBOX-1769
> Project: PDFBox
> Issue Type: Wish
> Components: Parsing
> Affects Versions: 1.8.2
> Reporter: William Palmer
> Assignee: Andreas Lehmkühler
> Fix For: 1.8.4, 2.0.0
>
>
> Need to search for a correct xref start address
> Example file:
> http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf
> Exception in thread "main" java.io.IOException: Error: Expected an integer
> type, actual='ref'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
> Using the code:
> PDFTextStripper ts = new PDFTextStripper();
> PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
> RandomAccess scratchFile = new
> RandomAccessFile(File.createTempFile("pdfbox-", ".tmp"), "rw");
> PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
> ts.setForceParsing(true);
> ts.writeText(doc, out);
> Related: PDFBOX-1757
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)