David Medinets created PDFBOX-3595:
--------------------------------------

             Summary: For a PDF - Loading from URL works. Loading from BAIS 
does not.
                 Key: PDFBOX-3595
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3595
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 2.0.3, 1.8.12
         Environment: Windows
            Reporter: David Medinets
            Priority: Minor


I've found several PDF files at 
https://www.supremecourt.gov/opinions/boundvolumes.aspx that throw an exception 
when using PDDocument.load with a ByteArrayInputStream but do not throw an 
exception when the same PDF is loaded using a URL.

v1.8.12 is the last version in which the load method takes a URL object. I 
mention it here in case that reference point of 'working' code helps diagnose 
this issue.
 
Below is the complete program that shows the two approaches. The first works. 
The second does not.

```
package com.affy.wildtuna.adrivers;

import java.io.ByteArrayInputStream;
import java.net.URL;
import org.apache.commons.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;

public class ShowInvalidDistancesSetException {

    public static void main(final String[] args) throws Exception {
        String url = 
"https://www.supremecourt.gov/opinions/boundvolumes/545bv.pdf";;
        PDDocument doc01 = PDDocument.load(new URL(url));
        doc01.close();
        System.out.println("Loading from URL works.");
        
        String contents = IOUtils.toString(new URL(url).openStream());
        try (ByteArrayInputStream bais = new 
ByteArrayInputStream(contents.getBytes())) {
            PDDocument doc = PDDocument.load(bais);
            doc.close();
        }
    }
}
```

Here is the program's output:

```
WARNING: Specified stream length 6845 is wrong. Fall back to reading stream 
until 'endstream'.
Loading from URL works.
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Exception in thread "main" java.io.IOException
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
        at 
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64)
        at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:574)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
        at 
com.affy.wildtuna.adrivers.ShowInvalidDistancesSetException.main(ShowInvalidDistancesSetException.java:18)
Caused by: java.util.zip.DataFormatException: invalid distances set
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at java.util.zip.Inflater.inflate(Inflater.java:280)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:169)
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
        ... 9 more
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to