[ 
https://issues.apache.org/jira/browse/TIKA-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159014#comment-17159014
 ] 

Andreas Weber edited comment on TIKA-3088 at 7/16/20, 8:37 AM:
---------------------------------------------------------------

This bug seems to be a consequence of a change made in tika-1.15 (which still 
is contained in newer versions, up to current version 1.24.1)

org.apache.tika.parser.odf,OpenDocumentContentParser

{{@Override}}
public void endElement()

{{  ....}}

{{  if (completelyFiltered == 0) {}}
{{    ...}}{{       }}

{{       } else if ("annotation".equals(localName) || "note".equals(localName) 
||}}
{{                  "notes".equals(localName)) {}}
{{          closeStyleTags();}}
{{          handler.endElement("", localName, localName);        // empty 
namespace uri - why?}}
{{       }}}

{{    ...}}

The problem here is the empty namespace string "", which causes a different 
handling in 
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement():

{{    if (null != namespaceURI && namespaceURI.length() > 0) {    // this 
condition is not true anymore}}

{{      super.endElement(namespaceURI, localName, name);}}

{{      return;}}

 

So the fix seems to be as easy as providing a correct namespace uri string in 
OpenDocumentContentParser.{{endElement()}}, e.g.:

{{        handler.endElement(XHTMLContentHandler.XHTML}}{{, localName, 
localName);  }}

 

 


was (Author: andreasweber):
This bug seems to be a consequence of a change made in tika-1.15 (which still 
is contained in newer versions, up to current version 1.24.1)

org.apache.tika.parser.odf,OpenDocumentContentParser

{{@Override}}
{{ public void endElement(}}

{{  ....}}

{{  if (completelyFiltered == 0) {}}
{{    ...}}{{       }}

{{       } else if ("annotation".equals(localName) || "note".equals(localName) 
||}}
{{                  "notes".equals(localName)) {}}
{{          closeStyleTags();}}
{{          handler.endElement("", localName, localName);        // empty 
namespace uri - why?}}
{{       }}}

{{    ...}}

The problem here is the empty namespace string "", which causes a different 
handling in 
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement():

{{    if (null != namespaceURI && namespaceURI.length() > 0) {    // this 
condition is not true anymore}}

{{      super.endElement(namespaceURI, localName, name);}}

{{      return;}}

 

So the fix seems to be as easy as providing a correct namespace uri string in 
OpenDocumentContentParser.{{endElement()}}, e.g.:

{{        handler.endElement(XHTMLContentHandler.XHTML}}{{, localName, 
localName);  }}

 

 

> java.lang.NullPointerException when converting Open Office presentation 
> (.odp) to html
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-3088
>                 URL: https://issues.apache.org/jira/browse/TIKA-3088
>             Project: Tika
>          Issue Type: Bug
>          Components: app
>    Affects Versions: 1.23
>            Reporter: Vladimir Kotik
>            Priority: Major
>         Attachments: 1.odp, 11.odp, 7.odp
>
>
> The attempt to convert an odp file to html format ends with this:
> D:\>java -jar tika-app-1.23.jar --html D:\testdata\Presentations\11.odp
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.odf.OpenDocumentParser@710f4dc7        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)        
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)     
>    at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)      
>   at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)        
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)        at 
> org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
> *Caused by: java.lang.NullPointerException*        at 
> com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement(Unknown 
> Source)        at 
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(Unknown
>  Source)        at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.sax.ExpandedTitleContentHandler.endElement(ExpandedTitleContentHandler.java:70)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:274)
>         at 
> org.apache.tika.sax.XHTMLContentHandler.endElement(XHTMLContentHandler.java:271)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.parser.odf.OpenDocumentContentParser$OpenDocumentElementMappingContentHandler.endElement(OpenDocumentContentParser.java:425)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at 
> org.apache.tika.parser.odf.NSNormalizerContentHandler.endElement(NSNormalizerContentHandler.java:75)
>         at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown 
> Source)        at 
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown 
> Source)        at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>  Source)        at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
> Source)        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
> Source)        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)   
>      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)     
>    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)  
>       at javax.xml.parsers.SAXParser.parse(Unknown Source)        at 
> org.apache.tika.utils.XMLReaderUtils.parseSAX(XMLReaderUtils.java:491)        
> at 
> org.apache.tika.parser.odf.OpenDocumentContentParser.parseInternal(OpenDocumentContentParser.java:599)
>         at 
> org.apache.tika.parser.odf.OpenDocumentParser.handleZipEntry(OpenDocumentParser.java:220)
>         at 
> org.apache.tika.parser.odf.OpenDocumentParser.handleZipFile(OpenDocumentParser.java:204)
>         at 
> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:157)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)        
> ... 5 more
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to