[ 
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600546#comment-13600546
 ] 

Dan Filimon commented on MAHOUT-668:
------------------------------------

Well, first off, yes, it makes the nearest-neighbors part superfluous. By the 
same token, there is a NearestNUserNeighborhood class in 
o.a.m.cf.taste.impl.neighborhood that could probably be replaced.

But, what I mean is that the bigger picture is using nearest-neighbors for 
classification in some principled way, isn't it?
Ted, you actually asked me to test that: building distance vectors a point to 
each cluster and then applying the e^(-d^2) transform and applying logistic 
regression is like using radial basis functions then logistic regression.

Wouldn't it be useful to have code in Mahout that does this directly rather 
than going through the entire process manually?

Now, I don't know whether this particular patch can be adapted to use whatever 
code Mahout now has easily (it might be that the code has sadly rotted). But, 
feature-wise, it seems useful.
                
> Adding knn support to Mahout classifiers
> ----------------------------------------
>
>                 Key: MAHOUT-668
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-668
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Daniel McEnnis
>              Labels: classification, knn
>             Fix For: Backlog
>
>         Attachments: Mahout-668-2.patch, Mahout-668-3.patch, 
> Mahout-668-3.patch, Mahout-668-3.patch, Mahout-668-3.patch, 
> Mahout-668-3.patch, Mahout-668-3.patch, Mahout-668-3.patch, Mahout-668.pat, 
> MAHOUT-668.pat
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Initial implementation of the knn.  This is a minimum base set with many more 
> possible add-ons including support for text and weka input as well as a 
> classify only (no confusion matrix) back end.  The system was tested on the 
> 20 newsgroup data set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to