[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037655#comment-13037655 ]
Daniel McEnnis commented on MAHOUT-668: --------------------------------------- Ted, Cosine distance is quite different from Euclidean distance. In Euclidean, the size of the file drives the distance metric, in Cosine, the angle between the two file vectors is the only measure taken. Also, here is a difference be3tween my version of Euclidean metric and the one already present: present: {NaN,1,NaN}x{1,1,1} = 0.0 distance new: {0,1,0}X{1,1,1} = 1.47 distance Daniel. > Adding knn support to Mahout classifiers > ---------------------------------------- > > Key: MAHOUT-668 > URL: https://issues.apache.org/jira/browse/MAHOUT-668 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.6 > Reporter: Daniel McEnnis > Labels: classification, knn > Attachments: MAHOUT-668.pat, Mahout-668-2.patch, Mahout-668-3.patch, > Mahout-668-3.patch, Mahout-668.pat > > Original Estimate: 672h > Remaining Estimate: 672h > > Initial implementation of the knn. This is a minimum base set with many more > possible add-ons including support for text and weka input as well as a > classify only (no confusion matrix) back end. The system was tested on the > 20 newsgroup data set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira