Konstantin Gribov created TIKA-1597:
---------------------------------------
Summary: RTF with embedded image parsing produces div before html
Key: TIKA-1597
URL: https://issues.apache.org/jira/browse/TIKA-1597
Project: Tika
Issue Type: Bug
Affects Versions: 1.7
Environment: linux, oracle jdk 7u75
Reporter: Konstantin Gribov
On tika-1.8-rc1.
{{java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf}} returns
{noformat}
<?xml version="1.0" encoding="UTF-8"?><div
xmlns="http://www.w3.org/1999/xhtml">HOHcvanAHTI'Imoc
v8 Hanemnan npfiBOBafi "DRAW
</div>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<!-- tail omitted -->
{noformat}
Removing image prevents such behavior ({{3.rtf}} doesn't contain embedded
image).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)