https://issues.apache.org/bugzilla/show_bug.cgi?id=55733
Bug ID: 55733
Summary: NullPointerException when attempting to parse a Word
document with no headers
Product: POI
Version: 3.9
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: XWPF
Assignee: [email protected]
Reporter: [email protected]
Created attachment 30990
--> https://issues.apache.org/bugzilla/attachment.cgi?id=30990&action=edit
Two Word test files without headers - one throws NullPointerException, one
doesn't
I was given a programmatically generated Word document that did not contain any
headers. MS Word is able to open this, however I get a NullPointerException
when attempting with XWPFWordExtractor.getText(). Specifically:
java.lang.NullPointerException
at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.extractHeaders(XWPFWordExtractor.java:162)
at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.getText(XWPFWordExtractor.java:87)
at Test.testPrintDoc(Test.java:16)
at Test.main(Test.java:26)
Looking at the code, it looks like hfPolicy is passed in as null to
XWPFWordExtractor.extractHeaders() from XWPFWordExtractor.getText():
public String getText() {
StringBuffer text = new StringBuffer();
XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy();
// Start out with all headers
extractHeaders(text, hfPolicy);
which says the headerFooterPolicy of the Document (from
Document.getHeaderFooterPolicy()) is never set in Document, and is the source
of the null propagated to cause the error.
I'd chalk it up to an invalid Word document, however MS Word can open the file.
If you open it in Word, don't make any changes but just re-save it out, it
still reports it doesn't have headers, but the new file can be read by
XWPFWordExtractor.getText() without the NullPointerException.
Example word documents without a header that throw the error and don't throw it
are attached. Here's the test code I used to print out what was in the file.
import java.io.FileInputStream;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class Test {
public static void testPrintDoc(String file) throws Exception {
FileInputStream fis = new FileInputStream(file);
System.err.println("Reading " + file);
try {
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor textExtractor = new XWPFWordExtractor(doc);
System.err.println(textExtractor.getText());
} finally {
fis.close();
}
}
public static void main(String[] args) {
try {
Test.testPrintDoc("noHeaders.docx");
} catch (Exception e) {
e.printStackTrace();
}
try {
Test.testPrintDoc("noHeaders_resaved.docx");
} catch (Exception e) {
e.printStackTrace();
}
}
}
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]