[ https://issues.apache.org/jira/browse/PDFBOX-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr resolved PDFBOX-3887. ------------------------------------- Resolution: Fixed Assignee: Tilman Hausherr Fix Version/s: 3.0.0 2.0.8 > Getting a "DataFormatException: invalid distance too far back" exception for > the attached file > ---------------------------------------------------------------------------------------------- > > Key: PDFBOX-3887 > URL: https://issues.apache.org/jira/browse/PDFBOX-3887 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.7 > Environment: Windows 10 64-bit, Ubuntu 14.04 64-bit. > java version "1.8.0_141" > Java(TM) SE Runtime Environment (build 1.8.0_141-b15) > Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode) > Reporter: Harun Reşit Zafer > Assignee: Tilman Hausherr > Labels: extraction, parsing > Fix For: 2.0.8, 3.0.0 > > Attachments: non-contract_00025.pdf > > > PdfBox throws the following exception: > {code:java} > Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid > distance too far back > at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82) > at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69) > at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162) > at > org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678) > at > org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888) > at > com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131) > ... 7 more > Caused by: java.util.zip.DataFormatException: invalid distance too far back > at java.util.zip.Inflater.inflateBytes(Native Method) > at java.util.zip.Inflater.inflate(Inflater.java:259) > at java.util.zip.Inflater.inflate(Inflater.java:280) > at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107) > at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73) > ... 20 more > {code} > If there is no quick solution for this bug, is there a workaround? Can I > somehow catch the exception and take some action? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org