[
https://issues.apache.org/jira/browse/MADLIB-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291509#comment-16291509
]
Frank McQuillan edited comment on MADLIB-1181 at 12/14/17 9:12 PM:
-------------------------------------------------------------------
Classification involves majority voting, and weighted voting means that the
number of votes allocated to a training point is a function of its distance to
the test point.
toy e.g.,
if the training point t1 is 2x closer to the test point than training point t2,
then the categorical variable for t1 gets 2 votes and t2 gets 1 vote.
was (Author: fmcquillan):
Classification is based on majority voting, so weighted average does not apply
to classification.
I will update the JIRA description to say this is only for regression only.
> Add an option for weighted average in k-NN
> ------------------------------------------
>
> Key: MADLIB-1181
> URL: https://issues.apache.org/jira/browse/MADLIB-1181
> Project: Apache MADlib
> Issue Type: Improvement
> Components: k-NN
> Reporter: Frank McQuillan
> Assignee: Himanshu Pandey
> Priority: Minor
> Fix For: v1.14
>
>
> Follow on from
> https://issues.apache.org/jira/browse/MADLIB-1059
> (please see this JIRA for additional comments)
> MADlib does a simple average of the k-nearest neighbors to come up with the
> final value for classification and regression. Doing a weighted average
> instead
> might be a desirable functionality. The weighting for the average can be
> based on the
> distance of the k-nearest neighbors.
> We can probably provide an optional parameter to let users choose how the
> final
> score has to be computed (avg or weighted avg).
> This JIRA applies to classification and regression.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)