Jorge Spinsanti created TIKA-2225:
-------------------------------------
Summary: Parse DOCX file due to NullPointerException on POI code
Key: TIKA-2225
URL: https://issues.apache.org/jira/browse/TIKA-2225
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14
Reporter: Jorge Spinsanti
I'm trying to get text from DOCX file but I got an exception due to
NullPonterException on POI code. Stacktrace:
{code}
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@4f5692fe
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:422)
at
org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:131)
at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:286)
at
org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:609)
at
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:517)
at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:346)
at
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:273)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:169)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:130)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)