[
https://issues.apache.org/jira/browse/PDFBOX-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260941#comment-16260941
]
Nicolas M commented on PDFBOX-4019:
-----------------------------------
??About half of our work is being lenient with bad PDFs.??
Yes, I read this in the changelogs ;) What I wanted to say was "Is there a plan
to fix the issue mentioned in the mail archive" (As I thought that it was the
same as mine).
The pdf come from here :
http://www.tele.net.in/index.php?option=com_k2&view=item&format=pdf&id=21355:sterlite-technologies-announces-results-for-2016-17-records-31-per-cent-jump-in-pat&Itemid=61
I think the problem is very rare (and weird).
> Expected 'Page' but found COSName{Font} in PDPageTree
> -----------------------------------------------------
>
> Key: PDFBOX-4019
> URL: https://issues.apache.org/jira/browse/PDFBOX-4019
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel, Text extraction
> Affects Versions: 2.0.8
> Environment: Debian 9 / MacOs (not OS related)
> Reporter: Nicolas M
> Attachments: Sterlite Technologies.pdf
>
>
> Hello,
> I have a PDF document that produces the following stack trace :
> {code:java}
> INFO: OpenType Layout tables used in font FreeSans are not implemented in
> PDFBox and will be ignored
> Exception in thread "Thread-1" java.lang.IllegalStateException: Expected
> 'Page' but found COSName{Font}
> at
> org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:227)
> at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:38)
> at
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:189)
> at
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:153)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:314)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at
> org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
> {code}
> I found a similar problem here
> https://mail-archives.apache.org/mod_mbox/pdfbox-users/201610.mbox/%[email protected]%3E
> So, I understand that the problem comes from the pdf itself but given that
> some readers recover from it, is there any plan to add some recovery methods
> in PdfBox too?
> Thanks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]