Konstantin Gribov created TIKA-1597:
---------------------------------------

             Summary: RTF with embedded image parsing produces div before html
                 Key: TIKA-1597
                 URL: https://issues.apache.org/jira/browse/TIKA-1597
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.7
         Environment: linux, oracle jdk 7u75
            Reporter: Konstantin Gribov


On tika-1.8-rc1.

{{java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf}} returns
{noformat}
<?xml version="1.0" encoding="UTF-8"?><div 
xmlns="http://www.w3.org/1999/xhtml";>HOHcvanAHTI'Imoc
v8 Hanemnan npfiBOBafi "DRAW

</div>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<!-- tail omitted -->
{noformat}

Removing image prevents such behavior ({{3.rtf}} doesn't contain embedded 
image).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to