[
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949303#comment-15949303
]
Sebb commented on CODEC-199:
----------------------------
The problem is that the patch does not just fix the incorrect behaviour - it
also breaks existing good behaviour if the original mapping is used either
deliberately (or accidentally due to caching).
Changing the mapping provided by a user also has side effects - it stops a user
from using mapping H and W as 0.
> Bug in HW rule in Soundex
> -------------------------
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.10
> Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a
> character that is preceded by two characters that are either H or W, is not
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)