[jira] [Closed] (PDFBOX-2186) java.io.IOException: Catalog cannot be found

JIRA Fri, 19 Sep 2014 07:41:57 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler closed PDFBOX-2186.
--------------------------------------

> java.io.IOException: Catalog cannot be found
> --------------------------------------------
>
>                 Key: PDFBOX-2186
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2186
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.6, 1.8.7, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>             Fix For: 1.8.7, 2.0.0
>
>         Attachments: PDFBOX-2186.pdf
>
>
> I get this with the attached file:
> {code}
> Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
> Warnung: invalid xref line: 0000000000 65535    f
> Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
> Warnung: Count in xref table is 0 at offset 334372
> Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
> initi
> alParse
> Warnung: Expected trailer object at position 334373, keep trying
> Exception in thread "main" java.io.IOException: Catalog cannot be found
>         at org.apache.pdfbox.cos.COSDocument.getCatalog(COSDocument.java:522)
>         at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSe
> quentialPDFParser.java:482)
>         at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentia
> lPDFParser.java:757)
>         at 
> org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1157)
>         at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:197)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:89)
> {code}
> The cause is a TAB in an xref line. The solution is to search for a backslash 
> s regex instead of a space only.
> I'm not touching the preflight parser (who has the same code line) because I 
> assume that he should not be lenient.
> Source of the file:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/002.zip  file 959



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (PDFBOX-2186) java.io.IOException: Catalog cannot be found

Reply via email to