[ 
https://issues.apache.org/jira/browse/TIKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-1597.
-----------------------------------

> RTF with embedded image parsing produces div before html
> --------------------------------------------------------
>
>                 Key: TIKA-1597
>                 URL: https://issues.apache.org/jira/browse/TIKA-1597
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.7
>         Environment: linux, oracle jdk 7u75
>            Reporter: Konstantin Gribov
>             Fix For: 1.8
>
>         Attachments: 2.rtf, 3.rtf
>
>
> On tika-1.8-rc1.
> {{java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf}} returns
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><div 
> xmlns="http://www.w3.org/1999/xhtml";>HOHcvanAHTI'Imoc
> v8 Hanemnan npfiBOBafi "DRAW
> </div>
> <html xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> <!-- tail omitted -->
> {noformat}
> Removing image prevents such behavior ({{3.rtf}} doesn't contain embedded 
> image).
> Update: you should have tesseract installed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to