I ran into a problem with tika extraction of ppt files today, and I
think it traced it back to some mistaken code in the HSLFExtractor
// Repeat the Notes header, if set
if (hf != null && hf.isHeaderVisible() && hf.getHeaderText() != null) {
xhtml.startElement("p", "class", "slide-note-header");
xhtml.characters( hf.getFooterText() ); <----------
shouldn't this be hf.getHeaderText()? the getFooterText() call here
is returning null, and causing an NPE in the XHTMLContentHandler
xhtml.endElement("p");
}
Joe