[ 
https://issues.apache.org/jira/browse/PDFBOX-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261226#comment-16261226
 ] 

Tilman Hausherr commented on PDFBOX-4019:
-----------------------------------------

That one from the mail archive couldn't be fixed because we didn't have the PDF 
and the stack trace wasn't enough.

I clicked on the link... PDF.js has also trouble displaying:
{noformat}
Unable to get page 3 to initialize viewer 
Object { name: "UnknownErrorException", message: "page dictionary kid reference 
points to wrong type of object", details: "FormatError: page dictionary kid 
reference points to wrong type of object", stack: 
"UnknownErrorExceptionClosure@resource://pdf.js/build/pdf.js:441:37\n@resource://pdf.js/build/pdf.js:435:38\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:5010:23\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:88:18\n@resource://pdf.js/build/pdf.js:26:18\nwebpackUniversalModuleDefinition@resource://pdf.js/build/pdf.js:24:59\n@resource://pdf.js/build/pdf.js:16:11\n"
 }
viewer.js:4030:13
Unable to get page 4 to initialize viewer 
Object { name: "UnknownErrorException", message: "page dictionary kid reference 
points to wrong type of object", details: "FormatError: page dictionary kid 
reference points to wrong type of object", stack: 
"UnknownErrorExceptionClosure@resource://pdf.js/build/pdf.js:441:37\n@resource://pdf.js/build/pdf.js:435:38\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:5010:23\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:88:18\n@resource://pdf.js/build/pdf.js:26:18\nwebpackUniversalModuleDefinition@resource://pdf.js/build/pdf.js:24:59\n@resource://pdf.js/build/pdf.js:16:11\n"
 }
viewer.js:4030:13
Unable to get page for page view 
Object { name: "UnknownErrorException", message: "page dictionary kid reference 
points to wrong type of object", details: "FormatError: page dictionary kid 
reference points to wrong type of object", stack: 
"UnknownErrorExceptionClosure@resource://pdf.js/build/pdf.js:441:37\n@resource://pdf.js/build/pdf.js:435:38\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:5010:23\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:88:18\n@resource://pdf.js/build/pdf.js:26:18\nwebpackUniversalModuleDefinition@resource://pdf.js/build/pdf.js:24:59\n@resource://pdf.js/build/pdf.js:16:11\n"
 }
viewer.js:4375:7
Unable to get page for page view 
Object { name: "UnknownErrorException", message: "page dictionary kid reference 
points to wrong type of object", details: "FormatError: page dictionary kid 
reference points to wrong type of object", stack: 
"UnknownErrorExceptionClosure@resource://pdf.js/build/pdf.js:441:37\n@resource://pdf.js/build/pdf.js:435:38\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:5010:23\n__w_pdfjs_require__@resource://pdf.js/build/pdf.js:45:12\n@resource://pdf.js/build/pdf.js:88:18\n@resource://pdf.js/build/pdf.js:26:18\nwebpackUniversalModuleDefinition@resource://pdf.js/build/pdf.js:24:59\n@resource://pdf.js/build/pdf.js:16:11\n"
 }
{noformat}

The least that should be done is that we throw a checked exception instead of 
what's done now.

> Expected 'Page' but found COSName{Font} in PDPageTree
> -----------------------------------------------------
>
>                 Key: PDFBOX-4019
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4019
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel, Text extraction
>    Affects Versions: 2.0.8
>         Environment: Debian 9 / MacOs (not OS related)
>            Reporter: Nicolas M
>         Attachments: Sterlite Technologies.pdf
>
>
> Hello,
> I have a PDF document that produces the following stack trace :
> {code:java}
> INFO: OpenType Layout tables used in font FreeSans are not implemented in 
> PDFBox and will be ignored
> Exception in thread "Thread-1" java.lang.IllegalStateException: Expected 
> 'Page' but found COSName{Font}
>       at 
> org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:227)
>       at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:38)
>       at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:189)
>       at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:153)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:314)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
> {code}
> I found a similar problem here 
> https://mail-archives.apache.org/mod_mbox/pdfbox-users/201610.mbox/%[email protected]%3E
> So, I understand that the problem comes from the pdf itself but given that 
> some readers recover from it, is there any plan to add some recovery methods 
> in PdfBox too?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to