[
https://issues.apache.org/jira/browse/TIKA-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2899.
-------------------------------
Resolution: Fixed
Assignee: Tim Allison
Fix Version/s: 1.22
I added a stack that tracks p, li, ol and ul elements written to the xml
handler. It ensures alignment of elements in the output even if the RTF is
corrupt.
I am not convinced that the attached file has any problems, but the change will
ensure matched elements in the output.
If there are any objections to this fix, please let me know, and I can revert.
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.rtf.RTFParser@375a26af
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-2899
> URL: https://issues.apache.org/jira/browse/TIKA-2899
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.19
> Reporter: Pandurang
> Assignee: Tim Allison
> Priority: Critical
> Fix For: 1.22
>
> Attachments: ABC_PL_WI.rtf
>
>
> I am using Solr 8.0 by using solrnet liabrary we extracting some binary data
> to text. In that case we are getting below error.
> Its working fine for 99 % documents but its failing for only 1 % docs
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.rtf.RTFParser@375a26af
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
> ... 41 more
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)