ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
-------------------------------------------------------------------------
Key: PDFBOX-872
URL: https://issues.apache.org/jira/browse/PDFBOX-872
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 1.2.1
Environment: Windows XP [Версия 5.1.2600]
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
Reporter: Vladimir
Priority: Critical
This report:
http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf
With this code:
public static String getTransformed(InputStream inputStream) {
PDDocument pdDocument = null;
String document = null;
try {
PDFParser parser = new PDFParser(inputStream);
parser.parse();
pdDocument = parser.getPDDocument();
PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
document = pdf2html.getText(pdDocument);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (pdDocument != null) {
try {
pdDocument.getDocument().close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return document;
}
returns:
17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading
corrupt stream
null
java.io.IOException: Error: Expected an integer type, actual=''
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
at
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)
at
com.selerityfinancial.wwwscraper.utils.PDFUtil.getTransformed(PDFUtil.java:25)
at com.selerityfinancial.wwwscraper.utils.PDFUtil.main(PDFUtil.java:55)
in Foxit PDF this file was opened normally
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.