Álvaro Pérez Alarcón created MAHOUT-1344:
--------------------------------------------

             Summary: Self-Organizing Map algorithm (batch version)
                 Key: MAHOUT-1344
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1344
             Project: Mahout
          Issue Type: New Feature
          Components: Clustering
    Affects Versions: 0.8
            Reporter: Álvaro Pérez Alarcón
            Priority: Minor
             Fix For: 0.8


Good morning.
As part of my final year project, I have implemented a new module for Apache 
Mahout, implementing Kohonen's self-organizing map algorithm, in its batch 
version.

The work is already done, and I will proceed to submit a patch ASAP. It was 
developed over Mahout 0.8.
The patch includes unit tests and the algorithm was successfully used in a 
Hadoop cluster to cluster two big datasets. Results can be seen in [this image 
gallery|http://imgur.com/a/DlgRT].

The implementation uses the generic clustering algorithms implemented in the 
ClusterIterator class. Minor changes were made to this and other related 
classes to support some of the features, without affecting the execution of 
other algorithms.

The algorithm supports convergence and the ability to resume a work at a given 
iteration (mainly, in order to initialize KohonenBatchClusteringPolicy with a 
given iteration number, althought it also affects the names of the output 
directories).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to