[
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027331#comment-14027331
]
michael tobias commented on CODEC-187:
--------------------------------------
In case this helps debugging....
here is a debug trace from Steve Morse' BMPM implementation of the algorithm
for GENERIC, APPROX, autolanguage for the name "abram":
---------------------------------------------------------------------------------------------------------
applying language rules from (rulesany) to abram using languages 239840
char codes = [#61]a [#62]b [#72]r [#61]a [#6d]m
applying rule #248
pattern=a
lcontext=
rcontext=
subst=A
result=A
applying rule #249
pattern=b
lcontext=
rcontext=
subst=B
result=AB
applying rule #265
pattern=r
lcontext=
rcontext=
subst=r
result=ABr
applying rule #248
pattern=a
lcontext=
rcontext=
subst=A
result=ABrA
applying rule #27
pattern=m
lcontext=[aeiouy]
rcontext=
subst=(m|n[16448])
result=(ABrAm[239840]|ABrAn[64])
after language rules: (ABrAm[239840]|ABrAn[64])
applying final rules from (exactapproxcommon plus approxcommon) to ABrAm[239840]
no rules match for phonetic item 0 at position 0: A
no rules match for phonetic item 0 at position 1: AB
no rules match for phonetic item 0 at position 2: ABr
no rules match for phonetic item 0 at position 3: ABrA
no rules match for phonetic item 0 at position 4: ABrAm
applying final rules from (exactapproxcommon plus approxcommon) to ABrAn[64]
no rules match for phonetic item 1 at position 0: A
no rules match for phonetic item 1 at position 1: AB
no rules match for phonetic item 1 at position 2: ABr
no rules match for phonetic item 1 at position 3: ABrA
no rules match for phonetic item 1 at position 4: ABrAn
applying final rules from (approxany) to ABrAm[239840]
after applying final rule #60 to phonetic item #0 at position 0:
(a[239840]|o[239840]|Y[128]) pattern=A lcontext= rcontext= subst=(a|o|Y[128])
after applying final rule #3 to phonetic item #0 at position 1:
(ab[239840]|av[131072]|ob[239840]|ov[131072]|Yb[128]) pattern=B lcontext=
rcontext= subst=(b|v[131072])
no rules match for phonetic item 0 at position 2:
(ab[239840]|av[131072]|ob[239840]|ov[131072]|Yb[128])r
after applying final rule #56 to phonetic item #0 at position 3:
(abra[239840]|abro[239840]|avra[131072]|avro[131072]|obra[239840]|obro[239840]|ovra[131072]|ovro[131072]|Ybra[128]|Ybro[128])
pattern=A lcontext= rcontext=[fklmnprst]$ subst=(a|o)
no rules match for phonetic item 0 at position 4:
(abra[239840]|abro[239840]|avra[131072]|avro[131072]|obra[239840]|obro[239840]|ovra[131072]|ovro[131072]|Ybra[128]|Ybro[128])m
applying final rules from (approxany) to ABrAn[64]
after applying final rule #60 to phonetic item #1 at position 0:
(a[239840]|o[239840]|Y[128]) pattern=A lcontext= rcontext= subst=(a|o|Y[128])
after applying final rule #3 to phonetic item #1 at position 1:
(ab[239840]|av[131072]|ob[239840]|ov[131072]|Yb[128]) pattern=B lcontext=
rcontext= subst=(b|v[131072])
no rules match for phonetic item 1 at position 2:
(ab[239840]|av[131072]|ob[239840]|ov[131072]|Yb[128])r
after applying final rule #56 to phonetic item #1 at position 3:
(abra[239840]|abro[239840]|avra[131072]|avro[131072]|obra[239840]|obro[239840]|ovra[131072]|ovro[131072]|Ybra[128]|Ybro[128])
pattern=A lcontext= rcontext=[fklmnprst]$ subst=(a|o)
no rules match for phonetic item 1 at position 4:
(abra[239840]|abro[239840]|avra[131072]|avro[131072]|obra[239840]|obro[239840]|ovra[131072]|ovro[131072]|Ybra[128]|Ybro[128])n
(abram|abrom|avram|avrom|obram|obrom|ovram|ovrom|Ybram|Ybrom|abran|abron|obran|obron)
> Beider Morse Phonetic Matching producing incorrect tokens
> ---------------------------------------------------------
>
> Key: CODEC-187
> URL: https://issues.apache.org/jira/browse/CODEC-187
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: michael tobias
> Priority: Minor
>
> I believe the Beider Morse Phonetic Matching algorithm was added in Commons
> Codec 1.6
> The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02
> though it had been static since version 3.01 dated 19 Dec 2011 (it was first
> available as opensource as version 1.00 on 6 May 2009).
> I can see nothing in the Commons Codec Docs to say which version of BMPM was
> implemented so I am not sure if the problem with the algorithm as coded in
> the Codec is simply an old version or whether there are more basic problems
> with the implementation.
> How do I determine the version of the algorithm that was implemented in the
> Commons Codec?
> How do we ensure that the algorithm is updated if/when the BMPM algorithm
> changes?
> How do we ensure that the algorithm as coded in the Commons Codec is accurate
> and working as expected?
--
This message was sent by Atlassian JIRA
(v6.2#6252)