I recently joined this list as I have started to examine Apache Solr and am 
extremely interested in using soundex and phonetic tokens.

I have already pointed out some bugs in the current implementation of BMPM in 
the Commons Codec and 1 has already been fixed.

Having checked archived messages relating to the introduction of BMPM I see 
that at the time it was also discussed whether to implement Daitch-Mokotoff 
soundex at the same time.  It looks like this was never taken up but I am 
really interested in having this functionality.

Daitch-Mokotoff is a much more simple algorithm than BMPM (though it can 
'branch' and produce multiple tokens for the same word). It uses a rules table 
along with a very few additional instructions. The algorithm is in the public 
Domain and there are various implementations available (including a few 
apparently written in java but I am not convinced they are correct). If it is 
felt necessary I can get written permission from Gary Mokotoff and Randy Daitch 
to allow the algorithm to be used.  

I am currently discussing some changes to the algorithm with Gary Mokotoff and 
hope to have them agreed shortly.

At that point I will probably have a simple php implementation (not my code, 
but permission to adapt will be granted) which I would be interested in having 
ported to java for inclusion in the Commons Codec.

Is anybody on this list interested in assisting with this and porting an agreed 
php implementation to java?  I will be happy to test all output until we are 
satisfied it is fully functional.

Thanks

Michael




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to