Johannes Wirkkala Westlund created TIKA-3709:
------------------------------------------------

             Summary: RuntimeException when parsing word (.doc) document
                 Key: TIKA-3709
                 URL: https://issues.apache.org/jira/browse/TIKA-3709
             Project: Tika
          Issue Type: Bug
    Affects Versions: 2.3.0
            Reporter: Johannes Wirkkala Westlund
         Attachments: Avtalsvillkor (1).doc

Hi,

I have a word file that throw the following error when I try to parse it with 
Tika:


{code:java}
Caused by: java.lang.IllegalArgumentException: This paragraph is not the first 
one in the table
    at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:810)
    at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:272)
    at 
org.apache.tika.parser.microsoft.WordExtractor.handleHeaderFooter(WordExtractor.java:255)
    at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:210)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:216)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:173)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
    ... 5 more {code}

I have attached the document with this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to