Hi Arunav,
Contributions are certainly welcome. If you can post a patch on JIRA (
https://issues.apache.org/jira/browse/MAHOUT ), we can have a look at
it. I don't know if you've been monitoring our mailing lists or have
otherwise heard, but Mahout is no longer accepting new MapReduce code.
We're still in discussions regarding the next-generation Mahout
backends, but we're moving instead towards engine-agnostic (e.g. Mahout
DSL, see http://mahout.apache.org/users/sparkbindings/home.html )
implementations.
As for Minkowski distance, I'm not sure if someone else is working on
it, but as I mentioned you're welcome to post a patch and we can discuss
it from there. Thanks!
Shannon
On 5/18/14, 1:29 PM, Arunav Sanyal wrote:
Hi
I am new to apache mahout and would like to contribute in whatever humble
way I can.
I see that the Vector class in Apache Mahout does not have the
functionality of minkowski distance.
http://en.wikipedia.org/wiki/Minkowski_distance
is a distance metric which generalizes distance measures between any two
vectors. It can represent hamming distance, euclidean distance depending on
parameters. I already have a simple solution ready for review if this is
approved. Similarly I am working on the more generic Mahalobnis distance
measure.
My primary motive for introducing these distance measures is to come up
with a generic implementation of the K-nearest neighbor classifier (not to
be confused K-means clustering). I will be working on that as well shortly.
If somebody else is working towards these features, I would like to
collaborate and donate whatever code patches that they deem necessary. If
not, I humbly request that the community approve these for inclusion into
apache mahout.
Yours sincerely
Arunav Sanyal