[
https://issues.apache.org/jira/browse/CODEC-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary D. Gregory updated CODEC-330:
----------------------------------
Affects Version/s: (was: 1.19.0)
> org.apache.commons.codec.language.DaitchMokotoffSoundex.cleanup(String) does
> not remove special characters (e.g., punctuation)
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: CODEC-330
> URL: https://issues.apache.org/jira/browse/CODEC-330
> Project: Commons Codec
> Issue Type: Bug
> Environment: JDK 8, MacOS
> Reporter: Dianshu Liao
> Priority: Major
> Attachments: Screenshot 2025-05-19 at 1.01.11 am.png
>
>
> File: org.apache.commons.codec.language.DaitchMokotoffSoundex
> Method: private String cleanup(String input)
> h1.
> Problem
> The private method "private String cleanup(final String input)” in
> DaitchMokotoffSoundex is intended to sanitize the input string before
> applying the actual phonetic transformation. The implementation does not
> remove any special characters such as !, @, #, or numbers. These characters
> are preserved in the cleaned string, which can lead to incorrect or
> unexpected phonetic results.
>
> h1. Test Code
> package org.apache.commons.codec.language;
> import org.apache.commons.codec.language.DaitchMokotoffSoundex;
> import org.junit.Test;
> import java.lang.reflect.Method;
> import static org.junit.Assert.assertEquals;
> public class language_DaitchMokotoffSoundex_cleanup_Test {
> @Test(timeout = 4000)
> public void testCleanup() {
> try {
> // Instantiate the class
> DaitchMokotoffSoundex soundex = new DaitchMokotoffSoundex();
> // Access the private method using reflection
> Method cleanupMethod =
> DaitchMokotoffSoundex.class.getDeclaredMethod("cleanup", String.class);
> cleanupMethod.setAccessible(true);
> // Test input with whitespace
> String input = " Hello World ";
> String expectedOutput = "helloworld";
> String actualOutput = (String) cleanupMethod.invoke(soundex,
> input);
> assertEquals(expectedOutput, actualOutput);
> // Test input with special characters
> input = "Te$t!@#";
> expectedOutput = "test";
> actualOutput = (String) cleanupMethod.invoke(soundex, input);
> assertEquals(expectedOutput, actualOutput);
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
> h1. }
> Expected Result
> All non-letter characters (e.g., !, @, #, digits) should be removed as part
> of the cleanup process to ensure reliable phonetic encoding.
> h1.
> Actual Result
>
> Special characters are preserved. For example "Te$t!@#" -> "te$t!@#"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)