[
https://issues.apache.org/jira/browse/PDFBOX-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697832#comment-16697832
]
Tilman Hausherr edited comment on PDFBOX-4385 at 11/24/18 2:07 PM:
-------------------------------------------------------------------
Of course the PDF is invalid. 18446744073430152624 is not a valid page object
number and it indicates the creator software of your client has a bug. Parsing
on demand is a strategy which we don't support yet (although it would have its
advantages), it means parse only what we need, and the part with the bad object
number is in the structure tree which isn't needed unless you're blind (very
simplified, there are other uses too). But the structure tree isn't used by
PDFBox for what we usually do (rendering, text extraction, signing, etc)
although a basic API exists.
was (Author: tilman):
Of course the PDF is invalid. 18446744073430152624 is not a valid page object
number and it indicates the creator software of your client has a bug. Parsing
on demand is a strategy which we don't support, although it has its
advantages), parse only the stuff we need, and the part with the bad object
number is in the structure tree which isn't needed unless you're blind (very
simplified, there are other uses too). But the structure tree isn't used by
PDFBox for what we usually do (rendering, text extraction, signing, etc)
although a basic API exists.
> IOException "expected number, actual=COSFloat{18446744073430152624}" when
> loading PDF
> --------------------------------------------------------------------------------------
>
> Key: PDFBOX-4385
> URL: https://issues.apache.org/jira/browse/PDFBOX-4385
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.12
> Environment: Mac OS 10.14.1
> Reporter: Kasper Schnack
> Priority: Major
>
> On a PDF document, which opens fine with Adobe Reader and Preview on Mac OS,
> the PDDocument.load() method throws the following:
> java.io.IOException: expected number, actual=COSFloat\{18446744073430152624}
> at offset 33182
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166)
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:862)
> at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:905)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:874)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:794)
> at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:754)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:185)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1160)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1057)
> Sorry the material is sensitive so I can't attach it :(
>
> However if I cat the file it looks like this around the offset:
> 48 0 obj
> << /Type /StructElem /S /P /P 30 0 R /Pg 2 0 R /K 15 >>
> endobj
> 49 0 obj
> << /Type /StructElem /S /P /P 30 0 R /Pg 2 0 R /K 16 >>
> endobj
> 50 0 obj
> << /Type /StructElem /S /P /P 30 0 R /Pg 2 0 R /K 17 >>
> endobj
> 51 0 obj
> << /Type /StructElem /S /P /P 30 0 R /Pg 2 0 R /K 18 >>
> endobj
> 52 0 obj
> << /Type /StructElem /S /P /P 30 0 R /Pg 18446744073430152624 0 R /K [ 99 0 R
> 100 0 R ] >>
> endobj
> 99 0 obj
> << /Type /StructElem /S /Span /P 52 0 R /Pg 2 0 R /K 19 >>
> endobj
> 100 0 obj
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]