[ 
https://issues.apache.org/jira/browse/CODEC-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niall Pemberton updated CODEC-84:
---------------------------------

    Attachment: CODEC-84-DoubleMetaphone-Alternate-bugs.patch

> Double Metaphone bugs in alternative encoding
> ---------------------------------------------
>
>                 Key: CODEC-84
>                 URL: https://issues.apache.org/jira/browse/CODEC-84
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.3
>            Reporter: Niall Pemberton
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: CODEC-84-DoubleMetaphone-Alternate-bugs.patch
>
>
> The new test case (CODEC-83) has highlighted a number of issues with the 
> "alternative" encoding in the Double Metaphone implementation
> 1) Bug in the handleG method when "G" is followed by "IER" 
>  *  The alternative encoding of "Angier" results in "ANKR" rather than "ANJR"
>  *  The alternative encoding of "rogier" results in "RKR" rather than "RJR"
> The problem is in the handleG() method and is caused by the wrong length (4 
> instead of 3) being used in the contains() method:
> {code}
>  } else if (contains(value, index + 1, 4, "IER")) {
> {code}
> ...this should be
> {code}
>  } else if (contains(value, index + 1, 3, "IER")) {
> {code}
> 2)  Bug in the handleL method
>  * The alternative encoding of "cabrillo" results in "KPRL " rather than "KPR"
> The problem is that the first thing this method does is append an "L" to both 
> primary & alternative encoding. When the conditionL0() method returns true 
> then the "L" should not be appended for the alternative encoding
> {code}
> result.append('L');
> if (charAt(value, index + 1) == 'L') {
>     if (conditionL0(value, index)) {
>         result.appendAlternate(' ');
>     }
>     index += 2;
> } else {
>     index++;
> }
> return index;
> {code}
> Suggest refeactoring this to
> {code}
> if (charAt(value, index + 1) == 'L') {
>     if (conditionL0(value, index)) {
>         result.appendPrimary('L');
>     } else {
>         result.append('L');
>     }
>     index += 2;
> } else {
>     result.append('L');
>     index++;
> }
> return index;
> {code}
> 3) Bug in the conditionL0() method for words ending in "AS" and "OS"
>  * The alternative encoding of "gallegos" results in "KLKS" rather than "KKS"
> The problem is caused by the wrong start position being used in the 
> contains() method, which means its not checking the last two characters of 
> the word but checks the previous & current position instead:
> {code}
>         } else if ((contains(value, index - 1, 2, "AS", "OS") || 
> {code}
> ...this should be
> {code}
>         } else if ((contains(value, value.length() - 2, 2, "AS", "OS") || 
> {code}
> I'll attach a patch for review

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to