norm 2 and CosineDistanceMeasure are a good, fairly standard, choice. The L1 norm is useful for some things too, but you can use any positive integer or "INF" for L_infinity normalization.
-jake On Wed, Jan 13, 2010 at 4:32 PM, Bogdan Vatkov <[email protected]>wrote: > Is it related to the distance calculation done > by org.apache.mahout.common.distance.CosineDistanceMeasure for example? > I am currently using --norm 2 in combination > with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok, > what > other options I have for the --norm value? > > On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <[email protected]> > wrote: > > > It makes sure your vectors are all unit length (according to the norm you > > choose - L2 norm > > means: make sure each vector satisfies v.dot(v) == 1.0, for example) > > > > This makes sure that when you want to compare vectors to each other, a > nice > > "distance" > > function is just distance(u, v) = 1 - u.dot(v) > > > > -jake > > > > On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <[email protected] > > >wrote: > > > > > What is the practical meaning of --norm parameter in the text-to-vector > ( > > > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) > process? > > > > > > Best regards, > > > Bogdan > > > > > > > > > -- > Best regards, > Bogdan >
