[ 
https://issues.apache.org/jira/browse/PDFBOX-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642972#comment-13642972
 ] 

Maruan Sahyoun commented on PDFBOX-1407:
----------------------------------------

The reason for the original exception was that the 'classic' parser invoked by 
PDDocument.load() is parsing PDFs sequentially from top to bottom. Because of 
this there might be references to PDF objects which are no longer valid but 
still within the PDF file. The 'non sequential parser' invoked by 
PDDocument.loadNonSeq() is parsing PDFs in line with the PDF specification 
which is by using the Xref entries to determine which PDF objects are valid. 

At time of this writing both parsers coexist as some applications are dependent 
on the 'classic' parser. This might change for the next major release.

In addition PDFBOX-1560 is addressing the infrastructure for the PDFBox website 
which will then build the basis for enhancing the documentation.
                
> ClassCastException: COSObject cannot be cast to COSName
> -------------------------------------------------------
>
>                 Key: PDFBOX-1407
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1407
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>            Reporter: Lau Brino
>            Assignee: Andreas Lehmkühler
>
> Parsing PDF file
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSObject cannot be cast 
> to org.apache.pdfbox.cos.COSName
>         at 
> org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:264)
>         at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:571)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1055)
>         at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:110)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to