I recently joined this list as I have started to examine Apache Solr and am extremely interested in using soundex and phonetic tokens.
I have already pointed out some bugs in the current implementation of BMPM in the Commons Codec and 1 has already been fixed. Having checked archived messages relating to the introduction of BMPM I see that at the time it was also discussed whether to implement Daitch-Mokotoff soundex at the same time. It looks like this was never taken up but I am really interested in having this functionality. Daitch-Mokotoff is a much more simple algorithm than BMPM (though it can 'branch' and produce multiple tokens for the same word). It uses a rules table along with a very few additional instructions. The algorithm is in the public Domain and there are various implementations available (including a few apparently written in java but I am not convinced they are correct). If it is felt necessary I can get written permission from Gary Mokotoff and Randy Daitch to allow the algorithm to be used. I am currently discussing some changes to the algorithm with Gary Mokotoff and hope to have them agreed shortly. At that point I will probably have a simple php implementation (not my code, but permission to adapt will be granted) which I would be interested in having ported to java for inclusion in the Commons Codec. Is anybody on this list interested in assisting with this and porting an agreed php implementation to java? I will be happy to test all output until we are satisfied it is fully functional. Thanks Michael --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org