Re: new distance metric

Sebastian Schelter Tue, 29 Mar 2011 22:36:36 -0700

Hi Daniel,

We would also need a "distributed" implementation of this new metric.Could you do that too?

Shouldn't be too hard, just have a look at the other implementations inorg.apache.mahout.math.hadoop.similarity.vector.


--sebastian


On 30.03.2011 00:40, Sean Owen wrote:

Great, the best place for this would be a JIRA issue:
https://issues.apache.org/jira/browse/MAHOUT
I think it needs a bit of style work. For example, it ought to be very
much like TanimotoCoefficientSimilarity. If you copied that and edited
a few key methods, you'd be a lot closer I think.
I guess I find the core computation a little quirky:

             double distance = preferring1+preferring2 - 2*intersection;
            if(distance<  1.0){
                distance=1.0-distance;
            }else{
                distance = -1.0 + 1.0 / distance;
            }

distance is an int, so I think it's

             int distance = preferring1+preferring2 - 2*intersection;
            if(distance == 0){
                distance=1;
            }else{
                distance = -1.0 + 1.0 / distance;
            }

The resulting values are a little odd then -- it can return values in
[-1,0], or 1.

By default I'd go with something more like "1.0 / (1.0 + distance)" I
suppose, though that's not somehow the one right way to map a distance
to a similarity -- though it would be consistent with
EuclideanDistanceSimilarity.


I'd actually welcome you to expand this idea and not just make a
"boolean pref" version of this but one that computes an actual
city-block distance for prefs with ratings too, for completeness.


I know this as "Manhattan distance". Is that an Americanism or is that
actually the more common name to anyone?



On Tue, Mar 29, 2011 at 10:16 PM, Daniel McEnnis<[email protected]>  wrote:

Dear,

Here is a patch of a new distance metric for the collaborative
filtering modules - CityBlockDistance.  With the 0 - 1 binary split on
preference. KLDistance, AHDistance, and Symmetric KLDistance don't
make sense.

Daniel McEnnis.

Re: new distance metric

Reply via email to