[jira] [Comment Edited] (CODEC-317) ColognePhonetic: Duplicate code in some cases

DRUser123 (Jira) Tue, 20 Feb 2024 01:53:31 -0800


    [ 
https://issues.apache.org/jira/browse/CODEC-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818725#comment-17818725
 ]


DRUser123 edited comment on CODEC-317 at 2/20/24 9:52 AM:
----------------------------------------------------------

Hi [~ggregory] , 
thank you for your reply, the Junit test is extremely simple: 

 
{noformat}
@Test
public void testColognePhonetic() {         
    ColognePhonetic colognePhonetic = new ColognePhonetic();
    String name = "Müller"; // Correct case         
    String name2 = "Müleler"; // Incorrect case         
    String name3 = "Mülhler"; // Incorrect case         
    System.out.println(name + ": " + colognePhonetic.colognePhonetic(name));
    System.out.println(name2 + ": " + colognePhonetic.colognePhonetic(name2));
    System.out.println(name3 + ": " + colognePhonetic.colognePhonetic(name3)); 
}
{noformat}
 

I ran the test in debug mode and put a breakpoint on the function in question 
({*}ColognePhonetic$CologneOutputBuffer.put(code) line 275{*}). 

As I see it, the solution would be to move line 275 inside the if so that the 
lastCode variable does not change unless the code is actually inserted into the 
output. 
Essentially, the function would become the following:

 
{noformat}
public void put(final char code) {
    if (code != CHAR_IGNORE && lastCode != code && (code != '0' || length == 0))
    {
        data[length] = code;
        length++;         
        lastCode = code;  // Here the line moved from outside to inside the if
    }
}
{noformat}
 

I hope it can help solve the issue!


was (Author: JIRAUSER304320):
Hi [~ggregory] , 
thank you for your reply, the Junit test is extremely simple: 

@Test
public void testColognePhonetic() {
        ColognePhonetic colognePhonetic = new ColognePhonetic(); 

        String name = "Müller"; // Correct case
        String name2 = "Müleler"; // Incorrect case
        String name3 = "Mülhler"; // Incorrect case

        System.out.println(name + ": " + colognePhonetic.colognePhonetic(name));
        System.out.println(name2 + ": " + 
colognePhonetic.colognePhonetic(name2));
        System.out.println(name3 + ": " + 
colognePhonetic.colognePhonetic(name3));
}
I ran the test in debug mode and put a breakpoint on the function in question 
({*}ColognePhonetic$CologneOutputBuffer.put(code) line 275{*}). 

As I see it, the solution would be to move line 275 inside the if so that the 
lastCode variable does not change unless the code is actually inserted into the 
output. 
Essentially, the function would become the following:

public void put(final char code) {
    if (code != CHAR_IGNORE && lastCode != code && (code != '0' || length == 
0)) {
        data[length] = code;
        length++;
        *lastCode = code;*  // Here the line moved from outside to inside the if
    }
}

I hope it can help solve the issue!

> ColognePhonetic: Duplicate code in some cases
> ---------------------------------------------
>
>                 Key: CODEC-317
>                 URL: https://issues.apache.org/jira/browse/CODEC-317
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.15, 1.16.1
>            Reporter: DRUser123
>            Priority: Major
>
> h2. ColognePhonetic: Duplicate code in some cases
> When the character "H" or an intermediate vowel (not at the beginning of the 
> string) is intercepted, the code should not be added to the output; however, 
> the lastCode variable takes the value of the latter, and this generates a 
> duplicate code recognition error. 
> The piece of code in question is 
> *ColognePhonetic$CologneOutputBuffer.put(code) line 275 version 1.16.1 
> (tested also with 1.15).*
> {+}Example with Müller (correctly coded){+}:
> Char = 'M', code = 6, lastCode = null, output = '6'
> Char = 'U', code = 0, lastCode = 6, output = '6' (no intermediate zeros are 
> added)
> Char = 'L', code = 5, lastCode = 0, output = '65'   
> Char = 'L', code = 5, lastCode = 5, output = '65' (no duplicate codes are 
> added)
> Char = 'E', code = 0, lastCode = 5, output = '65' (no intermediate zeros are 
> added)
> Char = 'R', code = 7, lastCode = 0, output = '657' 
> {+}Example with Mülhler (incorrectly coded){+}:
> Char = 'M', code = 6, lastCode = null, output = '6'
> Char = 'U', code = 0, lastCode = 6, output = '6' (no intermediate zeros are 
> added)
> Char = 'L', code = 5, lastCode = 0, output = '65'   
> Char = 'H', code = -, lastCode = 5, output = '65' 
> Char = 'L', {*}code = 5, lastCode = -{*}, output = '655' ({*}Fails to 
> identify duplicate code{*})
> Char = 'E', code = 0, lastCode = 5, output = '655' (No intermediate zeros are 
> added)
> Char = 'R', code = 7, lastCode = 0, output = '6557' 
> {+}Example with Müleler (incorrectly coded){+}:
> Char = 'M', code = 6, lastCode = null, output = '6'
> Char = 'U', code = 0, lastCode = 6, output = '6' (no intermediate zeros are 
> added)
> Char = 'L', code = 5, lastCode = 0, output = '65'   
> Char = 'E', code = 0, lastCode = 5, output = '65' (no intermediate zeros are 
> added)
> Char = 'L', {*}code = 5, lastCode = 0{*}, output = '655' ({*}Fails to 
> identify duplicate code{*})
> Char = 'E', code = 0, lastCode = 5, output = '655' (no intermediate zeros are 
> added)
> Char = 'R', code = 7, lastCode = 0, output = '6557' 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CODEC-317) ColognePhonetic: Duplicate code in some cases

Reply via email to