Folks, I may be in a position to contribute a very slick implementation of the Brown, dePietro, etc. bigram mutual information word clustering scheme sometime soon. It is written in C++, and if there's any map-reduce, its via OpenMP, not hadoop :-).
As an ASF member, if I'm facilitating getting something useful out as open source, I'd rather push it out at Apache. Any interest in stretching the Mahout tent out to accomodate it? I'm asking now because I'm starting a negotiation with the academic owner thereof, and it would be useful to know in advance if I have a tentative home for it at Apache as opposed to having to just dump it into SourceForge. You could take the attitude that it's part of Mahout as a challenge: can anyone out there come up with a practical variation in Java/Hadoop? --benson
