"a "distributed" implementation of this new metric" What would this do?
On Wed, Mar 30, 2011 at 7:55 AM, Daniel McEnnis <[email protected]> wrote: > Sebastion, > > It will be in the next patch. Thanks for the heads up. > > Daniel. > > On Wed, Mar 30, 2011 at 1:35 AM, Sebastian Schelter <[email protected]> wrote: >> Hi Daniel, >> >> We would also need a "distributed" implementation of this new metric. Could >> you do that too? >> >> Shouldn't be too hard, just have a look at the other implementations in >> org.apache.mahout.math.hadoop.similarity.vector. >> >> --sebastian >> >> >> On 30.03.2011 00:40, Sean Owen wrote: >>> >>> Great, the best place for this would be a JIRA issue: >>> https://issues.apache.org/jira/browse/MAHOUT >>> I think it needs a bit of style work. For example, it ought to be very >>> much like TanimotoCoefficientSimilarity. If you copied that and edited >>> a few key methods, you'd be a lot closer I think. >>> I guess I find the core computation a little quirky: >>> >>> double distance = preferring1+preferring2 - 2*intersection; >>> if(distance< 1.0){ >>> distance=1.0-distance; >>> }else{ >>> distance = -1.0 + 1.0 / distance; >>> } >>> >>> distance is an int, so I think it's >>> >>> int distance = preferring1+preferring2 - 2*intersection; >>> if(distance == 0){ >>> distance=1; >>> }else{ >>> distance = -1.0 + 1.0 / distance; >>> } >>> >>> The resulting values are a little odd then -- it can return values in >>> [-1,0], or 1. >>> >>> By default I'd go with something more like "1.0 / (1.0 + distance)" I >>> suppose, though that's not somehow the one right way to map a distance >>> to a similarity -- though it would be consistent with >>> EuclideanDistanceSimilarity. >>> >>> >>> I'd actually welcome you to expand this idea and not just make a >>> "boolean pref" version of this but one that computes an actual >>> city-block distance for prefs with ratings too, for completeness. >>> >>> >>> I know this as "Manhattan distance". Is that an Americanism or is that >>> actually the more common name to anyone? >>> >>> >>> >>> On Tue, Mar 29, 2011 at 10:16 PM, Daniel McEnnis<[email protected]> >>> wrote: >>>> >>>> Dear, >>>> >>>> Here is a patch of a new distance metric for the collaborative >>>> filtering modules - CityBlockDistance. With the 0 - 1 binary split on >>>> preference. KLDistance, AHDistance, and Symmetric KLDistance don't >>>> make sense. >>>> >>>> Daniel McEnnis. >> >> > -- Lance Norskog [email protected]
