[ 
https://issues.apache.org/jira/browse/TIKA-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned TIKA-777:
---------------------------------------

    Assignee: Michael McCandless
    
> RTF parser incorrectly applies fonts to complete group
> ------------------------------------------------------
>
>                 Key: TIKA-777
>                 URL: https://issues.apache.org/jira/browse/TIKA-777
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Arjohn Kampman
>            Assignee: Michael McCandless
>
> Tika's RTF parser processes the following rtf document incorrectly, applying 
> the wrong character encoding to the parsed characters:
> {\rtf1\ansi\ansicpg1252\fromtext \fbidis \deff0
> {\fonttbl
> {\f0\fswiss\fcharset0 Arial;}
> {\f1\fswiss\fcharset204 Arial;}
> }
> {\f1\fs20 \'d3\'e2\'e0\'e6\'e0\'e5\'ec\'fb\'e9 
> \'ea\'eb\'e8\'e5\'ed\'f2!\f0}\par
> }
> This document contains russian characters (\f1), but tika decodes these as 
> latin due to the \f0 directive at the end of the group. The RTF parser should 
> probably flush its pendingBytes buffer before processing directives such as 
> these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to