[
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030762#comment-14030762
]
michael tobias commented on CODEC-187:
--------------------------------------
but the rules for ashkenazi approx have not changed since 2009..... it was
clearly wrong at the first implementation.
Do you want me to re-open this issue or create a new one?
I am happy to spend time working with you to test the tokens produced by any
code update.
There is another issue however. If we are fixing bugs in the BMPM algorithm
and then also updating the rules to the latest version (which should not really
be very different form the original version coded) then any indexes generated
using BMPM should really be re-created because anybody updating their commons
codec will find that new indexing and queries will be producing different
tokens from those in existing indexes and so queries might not find existing
records.
Will the issue of a new commons codec be accompanied by detailed information
advising all existing indexes using BMPM to be re-created?
Michael
> Beider Morse Phonetic Matching producing incorrect tokens
> ---------------------------------------------------------
>
> Key: CODEC-187
> URL: https://issues.apache.org/jira/browse/CODEC-187
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: michael tobias
> Priority: Minor
> Fix For: 1.10
>
> Attachments: CODEC-187.patch
>
>
> I believe the Beider Morse Phonetic Matching algorithm was added in Commons
> Codec 1.6
> The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02
> though it had been static since version 3.01 dated 19 Dec 2011 (it was first
> available as opensource as version 1.00 on 6 May 2009).
> I can see nothing in the Commons Codec Docs to say which version of BMPM was
> implemented so I am not sure if the problem with the algorithm as coded in
> the Codec is simply an old version or whether there are more basic problems
> with the implementation.
> How do I determine the version of the algorithm that was implemented in the
> Commons Codec?
> How do we ensure that the algorithm is updated if/when the BMPM algorithm
> changes?
> How do we ensure that the algorithm as coded in the Commons Codec is accurate
> and working as expected?
--
This message was sent by Atlassian JIRA
(v6.2#6252)