Re: [jira] [Commented] (MAHOUT-843) Top Down Clustering

Paritosh Ranjan Thu, 08 Dec 2011 21:36:46 -0800

Lance- Thanks for the nice words :). The experience to go through allthis was really good.I think that all those suggestions/"hoops" made the implementation/codea lot better that what it was in the beginning.


Jeff deserves a big thanks for all the guidance :). Thanks Jeff.


On 09-12-2011 03:54, Lance Norskog wrote:

Paritosh- thanks for jumping through all of these hoops. (If only the
committers' code went through this much scrutiny :)

On Wed, Dec 7, 2011 at 9:57 PM, Hudson (Commented) (JIRA)
<[email protected]>wrote:

    [
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165014#comment-13165014]

Hudson commented on MAHOUT-843:
-------------------------------

Integrated in Mahout-Quality #1236 (See [
https://builds.apache.org/job/Mahout-Quality/1236/])
    MAHOUT-843: Final patch plus some integration fixes. All tests run

jeastman :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211715
Files :
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/PathDirectory.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/TopDownClusteringPathConstants.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessor.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorDriver.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorMapper.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorReducer.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/PathDirectoryTest.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
* /mahout/trunk/src/conf/clusterpp.props
* /mahout/trunk/src/conf/driver.classes.props

Top Down Clustering
-------------------

                 Key: MAHOUT-843
                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
             Project: Mahout
          Issue Type: New Feature
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Jeff Eastman
              Labels: clustering, patch
             Fix For: 0.6

         Attachments: MAHOUT-843-patch,

MAHOUT-843-patch-only-postprocessor,
MAHOUT-843-patch-only-postprocessor-final,
MAHOUT-843-patch-only-postprocessor-v1,
MAHOUT-843-patch-only-postprocessor-v2,
MAHOUT-843-patch-only-postprocessor-v3,
MAHOUT-843-patch-only-postprocessor-v4,
MAHOUT-843-patch-only-postprocessor-v5, MAHOUT-843-patch-v1,
Top-Down-Clustering-patch


Top Down Clustering works in multiple steps. The first step is to find

comparative bigger clusters. The second step is to cluster the bigger
chunks into meaningful clusters. This can performance while clustering big
amount of data. And, it also removes the dependency of providing input
clusters/numbers to the clustering algorithm.

The "big" is a relative term, as well as the smaller "meaningful" terms.

So, the control of this "bigger" and "smaller/meaningful" clusters will be
controlled by the user.

Which clustering algorithm to be used in the top level and which to use

in the bottom level can also be selected by the user. Initially, it can be
done for only one/few clustering algorithms, and later, option can be
provided to use all the algorithms ( which suits the case ).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-843) Top Down Clustering

Reply via email to