[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086380#comment-13086380
]
Chris A. Mattmann commented on TIKA-683:
----------------------------------------
Guys, I see there is a patch from Cristian (looks like the code update) and one
from Mike (the test case). Are we seeing that this resolves the issue? If so, I
can commit it, with the test case update from Mike (+Robert), and the sample
files, but wanted to check first. I have some free cycles, but by no means am a
UTF expert, nor a non-european character expert. I'm just willing to help get
these committed, and then let you experts tell me whether it works or not :)
> RTF Parser issues with non european characters
> ----------------------------------------------
>
> Key: TIKA-683
> URL: https://issues.apache.org/jira/browse/TIKA-683
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9
> Reporter: Nick Burch
> Assignee: Chris A. Mattmann
> Attachments: TIKA-683-unicode-testcase.patch, TIKA-683.patch,
> testRTFJapanese.rtf, testUnicodeUCNControlWordCharacterDoubling.rtf
>
>
> As reported on user@ in "non-West European languages support":
>
> http://mail-archives.apache.org/mod_mbox/tika-user/201107.mbox/%3cof0c0a3275.da7810e9-onc22578cc.0051eede-c22578cc.00525...@il.ibm.com%3E
> The RTF Parser seems to be doubling up some non-european characters
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira