[
https://issues.apache.org/jira/browse/LUCENE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-5986.
---------------------------------
Resolution: Not a Problem
This character only appears at the end of words.
> Incorrect character folding in Arabic
> -------------------------------------
>
> Key: LUCENE-5986
> URL: https://issues.apache.org/jira/browse/LUCENE-5986
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Jorge Cruanes
> Labels: easyfix
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> The function {{normalize(char s[], int len)}}, in the package
> {{org.apache.lucene.analysis.ar.ArabicNormalizer}}, makes an incorrect
> character folding in Arabic. The incorrect folding affects the letters Teh
> Marbuta (U+0629) and Heh (U+0647) at the end of a word (according to the
> study of El-Sherbiny et al., 2010, page 5).
> To fix this bug the solution is inserting an if clause, where the folding is
> made only an if the Teh Marbuta is not at the end of the word. Suggestion for
> the new case code is following:
> {quote}
> case TEH_MARBUTA:
> if (i < (len-1))
> s [ i ] = HEH;
> break;
> {quote}
> References:
> El-Sherbiny, A., Farah, M., Oueichek, I., Al-Zoman, A. (2010) Linguistic
> Guidelines for the Use of the Arabic Language in Internet Domains. Internet
> Society Requests For Comment (RFCs) (5564). pp 1-11. Available at:
> http://tools.ietf.org/html/rfc5564.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]