Lance- Thanks for the nice words :). The experience to go through all this was really good. I think that all those suggestions/"hoops" made the implementation/code a lot better that what it was in the beginning.

Jeff deserves a big thanks for all the guidance :). Thanks Jeff.

On 09-12-2011 03:54, Lance Norskog wrote:
Paritosh- thanks for jumping through all of these hoops. (If only the
committers' code went through this much scrutiny :)

On Wed, Dec 7, 2011 at 9:57 PM, Hudson (Commented) (JIRA)
<j...@apache.org>wrote:

    [
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165014#comment-13165014]

Hudson commented on MAHOUT-843:
-------------------------------

Integrated in Mahout-Quality #1236 (See [
https://builds.apache.org/job/Mahout-Quality/1236/])
    MAHOUT-843: Final patch plus some integration fixes. All tests run

jeastman :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211715
Files :
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/PathDirectory.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/TopDownClusteringPathConstants.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessor.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorDriver.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorMapper.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorReducer.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/PathDirectoryTest.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
*
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
* /mahout/trunk/src/conf/clusterpp.props
* /mahout/trunk/src/conf/driver.classes.props


Top Down Clustering
-------------------

                 Key: MAHOUT-843
                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
             Project: Mahout
          Issue Type: New Feature
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Jeff Eastman
              Labels: clustering, patch
             Fix For: 0.6

         Attachments: MAHOUT-843-patch,
MAHOUT-843-patch-only-postprocessor,
MAHOUT-843-patch-only-postprocessor-final,
MAHOUT-843-patch-only-postprocessor-v1,
MAHOUT-843-patch-only-postprocessor-v2,
MAHOUT-843-patch-only-postprocessor-v3,
MAHOUT-843-patch-only-postprocessor-v4,
MAHOUT-843-patch-only-postprocessor-v5, MAHOUT-843-patch-v1,
Top-Down-Clustering-patch

Top Down Clustering works in multiple steps. The first step is to find
comparative bigger clusters. The second step is to cluster the bigger
chunks into meaningful clusters. This can performance while clustering big
amount of data. And, it also removes the dependency of providing input
clusters/numbers to the clustering algorithm.
The "big" is a relative term, as well as the smaller "meaningful" terms.
So, the control of this "bigger" and "smaller/meaningful" clusters will be
controlled by the user.
Which clustering algorithm to be used in the top level and which to use
in the bottom level can also be selected by the user. Initially, it can be
done for only one/few clustering algorithms, and later, option can be
provided to use all the algorithms ( which suits the case ).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira





Reply via email to