Null from PDF
-------------
Key: PDFBOX-950
URL: https://issues.apache.org/jira/browse/PDFBOX-950
Project: PDFBox
Issue Type: Bug
Affects Versions: 1.4.0
Environment: Windows XP [5.1.2600]
java version "1.6.0_23"
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing)
Reporter: Vladimir
http://www.uss.com/corp/investors/sec_filings/3Q-2010-Earnings-Release.pdf
In Foxit Reader opened correctly
This code gets null:
public static String getHtml(InputStream inputStream) {
PDDocument pdDocument = null;
String document = null;
try {
PDFParser parser = new PDFParser(inputStream);
parser.parse();
pdDocument = parser.getPDDocument();
PDFText2HTML pdf2html = new PDFText2HTML(StringUtil.UTF_8());
document = pdf2html.getText(pdDocument);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (pdDocument != null) {
try {
pdDocument.getDocument().close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return document;
}
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15</artifactId>
<version>1.45</version>
</dependency>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcmail-jdk15</artifactId>
<version>1.45</version>
</dependency>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.