[
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196415#comment-14196415
]
Michael Tobias commented on CODEC-187:
--------------------------------------
Hi Thomas
The recent problems I detected were with the actual algorithm - not your
coding. I didn’t like some of the tokens I was seeing and have been discussing
this with Sasha Beider and Steve Morse.
They finally agreed to modify 2 of the rules tables and Steve has just loaded a
new version (3.04) to http://stevemorse.org/phoneticinfo.htm
How easy is to update the tables?
If we can get this update processed then I am happy to go live in Commons Codec
1.10
Also how are things with the D-M soundex implementation? How do we get that
added to the Commons Codec?
Regards
Michael
> Beider Morse Phonetic Matching producing incorrect tokens
> ---------------------------------------------------------
>
> Key: CODEC-187
> URL: https://issues.apache.org/jira/browse/CODEC-187
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: michael tobias
> Priority: Minor
> Fix For: 1.10
>
> Attachments: CODEC-187.patch, CODEC-187_ashkenazi_approx_any.patch,
> CODEC-187_ashkenazi_approx_any_v2.patch, CODEC_187_sync_with_v3.3.diff
>
>
> I believe the Beider Morse Phonetic Matching algorithm was added in Commons
> Codec 1.6
> The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02
> though it had been static since version 3.01 dated 19 Dec 2011 (it was first
> available as opensource as version 1.00 on 6 May 2009).
> I can see nothing in the Commons Codec Docs to say which version of BMPM was
> implemented so I am not sure if the problem with the algorithm as coded in
> the Codec is simply an old version or whether there are more basic problems
> with the implementation.
> How do I determine the version of the algorithm that was implemented in the
> Commons Codec?
> How do we ensure that the algorithm is updated if/when the BMPM algorithm
> changes?
> How do we ensure that the algorithm as coded in the Commons Codec is accurate
> and working as expected?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)