Klemens Dickbauer created TIKA-2919: ---------------------------------------
Summary: NullPointerException when parsing PDF with OCR and ToXMLContentHandler Key: TIKA-2919 URL: https://issues.apache.org/jira/browse/TIKA-2919 Project: Tika Issue Type: Bug Components: ocr Affects Versions: 1.21 Reporter: Klemens Dickbauer When parsing a pdf document the handler creates a structure (and fires appropriate SAX exents) that each page is wrapped into two <div> elements. The outer one of these has no parent element, so when the subsequent endElement method is called for "html", a NPE occurs when currentElement.parent is referenced: {code:java} java class ToXMLContentHandler: public void endElement(String uri, String localName, String qName) throws SAXException { if (this.inStartElement) { this.write(" />"); this.inStartElement = false; } else { this.write("</"); this.write(qName); this.write('>'); } this.namespaces.clear(); this.currentElement = this.currentElement.parent; } {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)