David Medinets created PDFBOX-3595:
--------------------------------------
Summary: For a PDF - Loading from URL works. Loading from BAIS
does not.
Key: PDFBOX-3595
URL: https://issues.apache.org/jira/browse/PDFBOX-3595
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 2.0.3, 1.8.12
Environment: Windows
Reporter: David Medinets
Priority: Minor
I've found several PDF files at
https://www.supremecourt.gov/opinions/boundvolumes.aspx that throw an exception
when using PDDocument.load with a ByteArrayInputStream but do not throw an
exception when the same PDF is loaded using a URL.
v1.8.12 is the last version in which the load method takes a URL object. I
mention it here in case that reference point of 'working' code helps diagnose
this issue.
Below is the complete program that shows the two approaches. The first works.
The second does not.
```
package com.affy.wildtuna.adrivers;
import java.io.ByteArrayInputStream;
import java.net.URL;
import org.apache.commons.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
public class ShowInvalidDistancesSetException {
public static void main(final String[] args) throws Exception {
String url =
"https://www.supremecourt.gov/opinions/boundvolumes/545bv.pdf";
PDDocument doc01 = PDDocument.load(new URL(url));
doc01.close();
System.out.println("Loading from URL works.");
String contents = IOUtils.toString(new URL(url).openStream());
try (ByteArrayInputStream bais = new
ByteArrayInputStream(contents.getBytes())) {
PDDocument doc = PDDocument.load(bais);
doc.close();
}
}
}
```
Here is the program's output:
```
WARNING: Specified stream length 6845 is wrong. Fall back to reading stream
until 'endstream'.
Loading from URL works.
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Exception in thread "main" java.io.IOException
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64)
at
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:574)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at
com.affy.wildtuna.adrivers.ShowInvalidDistancesSetException.main(ShowInvalidDistancesSetException.java:18)
Caused by: java.util.zip.DataFormatException: invalid distances set
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
at java.util.zip.Inflater.inflate(Inflater.java:280)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:169)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
... 9 more
```
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]