Tilman Hausherr created PDFBOX-2186:
---------------------------------------

             Summary: java.io.IOException: Catalog cannot be found
                 Key: PDFBOX-2186
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2186
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.8.6, 1.8.7, 2.0.0
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 1.8.7, 2.0.0


I get this with the attached file:
{code}
Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
Warnung: invalid xref line: 0000000000 65535    f
Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
Warnung: Count in xref table is 0 at offset 334372
Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.NonSequentialPDFParser initi
alParse
Warnung: Expected trailer object at position 334373, keep trying
Exception in thread "main" java.io.IOException: Catalog cannot be found
        at org.apache.pdfbox.cos.COSDocument.getCatalog(COSDocument.java:522)
        at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSe
quentialPDFParser.java:482)
        at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentia
lPDFParser.java:757)
        at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1157)

        at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:197)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:89)

{code}

The cause is a TAB in an xref line. The solution is to search for a backslash s 
regex instead of a space only.

I'm not touching the preflight parser (who has the same code line) because I 
assume that he should not be lenient.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to