[
https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071893#comment-13071893
]
Gary D. Gregory commented on CODEC-125:
---------------------------------------
I hope Matthew can fix this but here is what I found out so far.
I added:
{code:java}
@Ignore
@Test
public void testLongestEnglishSurname() throws EncoderException {
BeiderMorseEncoder bmpm = new BeiderMorseEncoder();
bmpm.setNameType(NameType.GENERIC);
bmpm.setRuleType(RuleType.APPROX);
bmpm.encode("MacGhilleseatheanaich");
}
{code}
Indeed takes forever (I killed it after a couple of minutes.) When I suspend
the test in the debugger, it appears to spend its time in:
PhoneticEngine.normalizeLanguageAttributes(String, boolean) line: 266
{code:java}
private String normalizeLanguageAttributes(final String input, final
boolean strip) {
String text = input;
Set<String> langs = new HashSet<String>();
int bracketStart;
while ((bracketStart = text.indexOf('[')) != -1) {
int bracketEnd = text.indexOf(']', bracketStart);
if (bracketEnd == -1) {
throw new IllegalArgumentException("no closing square bracket
in: " + text);
}
String body = text.substring(bracketStart + 1, bracketEnd);
langs.addAll(Arrays.asList(body.split("[+]")));
text = text.substring(0, bracketStart) + text.substring(bracketEnd
+ 1);
}
if (langs.isEmpty() || strip) {
return text;
} else if (langs.contains(Languages.ANY)) {
return "[" + Languages.ANY + "]";
} else {
return text + "[" + join(langs, "+") + "]";
}
}
{code}
The input String is 8,722,727 chars long!
No wonder it takes forever!
Matthew: Can this be?
> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
> Key: CODEC-125
> URL: https://issues.apache.org/jira/browse/CODEC-125
> Project: Commons Codec
> Issue Type: New Feature
> Reporter: Matthew Pocock
> Priority: Minor
> Attachments: bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch,
> bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the
> commons-codec svn trunk. I would like to contribute this to commons-codec.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira