[ 
https://issues.apache.org/jira/browse/MAHOUT-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172914#comment-13172914
 ] 

Paritosh Ranjan commented on MAHOUT-930:
----------------------------------------

Created MAHOUT-933 to implement a mapreduce version of ClusterIterator.
                
> Refactor Vector Classifaction out of Clustering - Make Classification abstract
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-930
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-930
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Right now, each clustering algorithm has its own runClustering ( -cp ) 
> implementation which produces clusteredPoints. The current design lacks :
> 1) Extensibility - No place to plugin new features like outlier removal while 
> classification
> 2) Uniformity in design - as new algorithms don't have a pattern to follow.
> 3) Abstraction - the clusterData should only bother about classifying vectors 
> i.e. assigning different vectors to clusters. Currently it lacks a bit of 
> abstraction. It should not care about how to classify. That should be the 
> work of a separate entity, which can have features like outlier removal.
> The new implementation factor out & implement an independent entity to 
> perform the classification step independently of the various clustering 
> implementations. The new design would start with ClusterClassifier, 
> ClusteringPolicy and ClusterIterator whose experimental versions are 
> available and committed. The currently committed version seems to work for 
> all the iterative clustering algorithms.
> The ClusterClassifier provides probability of any vector belonging to the 
> different clusters available. These probabilities are converted into weights 
> by different ClusteringPolicy implementations, which are for respective 
> clustering algorithms. This is the place where the outlier removal 
> implementation can be plugged in. In future, different implementations of 
> ClusteringPolicy can be provided (configured) for different type of 
> classification.
> The ClusterClassifier also gives the capability to train the existing 
> classifiers (clusters), by the input. This is the place where 
> clustering/classification will converge.
> The execution is done by a ClusterIterator for now, which runs a clustering 
> policy on the input and tries to classify the vectors to different clusters. 
> It can simultaneously train the classifiers, as it can run for given number 
> of iterations and each iteration would improve the quality of the classifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to