Richa Garg created TIKA-2014:
--------------------------------
Summary: Unable to parse doc file
Key: TIKA-2014
URL: https://issues.apache.org/jira/browse/TIKA-2014
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.12, 1.13
Environment: Ubuntu 14.04
Reporter: Richa Garg
Priority: Critical
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@65a3ca0
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
com.headhonchos.dal.operations.TikaParsing.process(TikaParsing.java:59)
.......
Caused by: java.lang.UnsupportedOperationException: Non-extended character
Pascal strings are not supported right now. Please, contact POI developers for
update.
at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82)
at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61)
at
org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52)
at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 34 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)