Jorge Spinsanti created TIKA-2225:
-------------------------------------

             Summary: Parse DOCX file due to NullPointerException on POI code
                 Key: TIKA-2225
                 URL: https://issues.apache.org/jira/browse/TIKA-2225
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.14
            Reporter: Jorge Spinsanti


I'm trying to get text from DOCX file but I got an exception due to 
NullPonterException on POI code. Stacktrace:

{code}
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@4f5692fe
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 16 more
Caused by: java.lang.NullPointerException
        at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:422)
        at 
org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:131)
        at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:286)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:609)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:517)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:346)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:273)
        at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:179)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:169)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:130)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to