[ 
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037655#comment-13037655
 ] 

Daniel McEnnis commented on MAHOUT-668:
---------------------------------------

Ted,

Cosine distance is quite different from  Euclidean distance.  In Euclidean, the 
size of the file drives the distance metric, in Cosine, the angle between the 
two file vectors is the only measure taken.  Also, here is a difference 
be3tween my version of Euclidean metric and the one already present:

present: {NaN,1,NaN}x{1,1,1} = 0.0 distance
new: {0,1,0}X{1,1,1} = 1.47 distance

Daniel.

> Adding knn support to Mahout classifiers
> ----------------------------------------
>
>                 Key: MAHOUT-668
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-668
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Daniel McEnnis
>              Labels: classification, knn
>         Attachments: MAHOUT-668.pat, Mahout-668-2.patch, Mahout-668-3.patch, 
> Mahout-668-3.patch, Mahout-668.pat
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Initial implementation of the knn.  This is a minimum base set with many more 
> possible add-ons including support for text and weka input as well as a 
> classify only (no confusion matrix) back end.  The system was tested on the 
> 20 newsgroup data set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to