[
https://issues.apache.org/jira/browse/TIKA-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned TIKA-777:
---------------------------------------
Assignee: Michael McCandless
> RTF parser incorrectly applies fonts to complete group
> ------------------------------------------------------
>
> Key: TIKA-777
> URL: https://issues.apache.org/jira/browse/TIKA-777
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Reporter: Arjohn Kampman
> Assignee: Michael McCandless
>
> Tika's RTF parser processes the following rtf document incorrectly, applying
> the wrong character encoding to the parsed characters:
> {\rtf1\ansi\ansicpg1252\fromtext \fbidis \deff0
> {\fonttbl
> {\f0\fswiss\fcharset0 Arial;}
> {\f1\fswiss\fcharset204 Arial;}
> }
> {\f1\fs20 \'d3\'e2\'e0\'e6\'e0\'e5\'ec\'fb\'e9
> \'ea\'eb\'e8\'e5\'ed\'f2!\f0}\par
> }
> This document contains russian characters (\f1), but tika decodes these as
> latin due to the \f0 directive at the end of the group. The RTF parser should
> probably flush its pendingBytes buffer before processing directives such as
> these.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira