[
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027232#comment-14027232
]
michael tobias commented on CODEC-187:
--------------------------------------
I believe I DID submit a bug report - but it would have made my job much easier
had I known the VERSION of the BMPM algorithm programmed into the Commons Codec.
I believe it is also vitally important to have the algorithm version
DOCUMENTED. It is a BUG not to specify the version of an algorithm implemented
when that algorithm is not static but is subject to revision.
Here is an example showing the bug in tokens returned:
Using the default nameType="GENERIC" ruleType="APPROX" languageSet="auto" the
name "Abram" should produce phonetic tokens as follows:
abram abrom avram avrom obram obrom ovram ovrom Ybram Ybrom abran abron obran
obron
these results for the name Abram have been unchanged since BMPM version 2.0
dated 18 June 2009.
but the BMPM in the Commons Codec returns:
abram abrom avram avrom obram obrom ovram ovrom abran abron obran obron
ie we are missing the 2 tokens:
Ybram Ybrom
if this one example is wrong then how many others will be?
How much testing was done of the coding?
> Beider Morse Phonetic Matching producing incorrect tokens
> ---------------------------------------------------------
>
> Key: CODEC-187
> URL: https://issues.apache.org/jira/browse/CODEC-187
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: michael tobias
> Priority: Minor
>
> I believe the Beider Morse Phonetic Matching algorithm was added in Commons
> Codec 1.6
> The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02
> though it had been static since version 3.01 dated 19 Dec 2011 (it was first
> available as opensource as version 1.00 on 6 May 2009).
> I can see nothing in the Commons Codec Docs to say which version of BMPM was
> implemented so I am not sure if the problem with the algorithm as coded in
> the Codec is simply an old version or whether there are more basic problems
> with the implementation.
> How do I determine the version of the algorithm that was implemented in the
> Commons Codec?
> How do we ensure that the algorithm is updated if/when the BMPM algorithm
> changes?
> How do we ensure that the algorithm as coded in the Commons Codec is accurate
> and working as expected?
--
This message was sent by Atlassian JIRA
(v6.2#6252)