[
https://issues.apache.org/jira/browse/CODEC-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630167#comment-16630167
]
Alex Volodko commented on CODEC-250:
------------------------------------
Regarding the ignored letters: yes, according to the algorithm they are simply
ignored. So for the implementation: if they are removed from the input string,
it should be correct. I understand that the description of the algorithm is
quite short and generic, it might have some room for interpretations. On the
other hand: this algorithm should allow to compare words phonetically: you
don't pronounce special characters and for the german language (which is the
main target group of cologne phonetic) I cannot came up with a "non letter"
character, which will change the pronunciation of the word.
Regarding the handling of ß (sharp s): yes, you are competly right. Stricly
according to the algorithm it should be handled as 'S', but both value 'S' and
'SS' have the same output, so at the end it doesn't matter, but on the other
hand: also don't make the code more understandable.
> Wrong value calculated by Cologne Phonetic if a special character is placed
> between equal letters
> -------------------------------------------------------------------------------------------------
>
> Key: CODEC-250
> URL: https://issues.apache.org/jira/browse/CODEC-250
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.5, 1.11
> Reporter: Alex Volodko
> Priority: Major
>
> The algorith for cologne phonetic is (simpilied):
> # Encode letter by letter from left to right according to the conversion
> table.
> # Remove all digits occurring more than once next to each other.
> # Remove all code "0" except at the beginning.
> Characters which are not specified in conversion table (such as hyphens) are
> ignored. See https://en.wikipedia.org/wiki/Cologne_phonetics
> If the input is "test-test" the step results will be:
> # 20822082
> # 2082082
> # 28282
> The expected result for "test-test" is therefor 28282.
> The actual result for "test-test" is 282{color:#FF0000}2{color}82.
> This bug is caused by the fix from
> [https://github.com/apache/commons-codec/commit/72c8759a22c6552a2dfcdf61b29729f981752879]
> and is present since 1.5
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)