[ 
https://issues.apache.org/jira/browse/PDFBOX-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3887.
-------------------------------------
       Resolution: Fixed
         Assignee: Tilman Hausherr
    Fix Version/s: 3.0.0
                   2.0.8

> Getting a "DataFormatException: invalid distance too far back" exception for 
> the attached file
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3887
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3887
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7
>         Environment: Windows 10 64-bit, Ubuntu 14.04 64-bit. 
> java version "1.8.0_141" 
> Java(TM) SE Runtime Environment (build 1.8.0_141-b15) 
> Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
>            Reporter: Harun Reşit Zafer
>            Assignee: Tilman Hausherr
>              Labels: extraction, parsing
>             Fix For: 2.0.8, 3.0.0
>
>         Attachments: non-contract_00025.pdf
>
>
> PdfBox throws the following exception:
> {code:java}
> Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
> distance too far back
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
>       at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
>       at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
>       at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
>       at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
>       at 
> com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
>       ... 7 more
> Caused by: java.util.zip.DataFormatException: invalid distance too far back
>       at java.util.zip.Inflater.inflateBytes(Native Method)
>       at java.util.zip.Inflater.inflate(Inflater.java:259)
>       at java.util.zip.Inflater.inflate(Inflater.java:280)
>       at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
>       ... 20 more
> {code}
> If there is no quick solution for this bug, is there a workaround? Can I 
> somehow catch the exception and take some action?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to