Sure Grant, The page on kMeans actually already has such links but I will create a duplicate on a new page along with some descriptive text. There is also an important paragraph from the original paper as a comment in testOne. It is a very simple algorithm in its essence. Scaling it and M/R-ing it obfuscates it quite a bit.
Jeff -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 5:45 PM To: [email protected] Subject: Re: [jira] Updated: (MAHOUT-3) Build initial canopy clustering prototype Thanks for the start, Jeff! Could you put up a link on the Wiki under algorithms with some basic info on canopy clustering, maybe a link to the main paper on it? Would be helpful, when reviewing, to have background information. -Grant On Feb 7, 2008, at 1:13 AM, Jeff Eastman (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/MAHOUT-3?page=com.atlassian.jira.p lugin.system.issuetabpanels:all-tabpanel > ] > > Jeff Eastman updated MAHOUT-3: > ------------------------------ > > Attachment: MAHOUT-3.diff > > Here's an initial patch which introduces a couple of unit tests that > implement a very basic canopy cluster, two distance measures and a > Canopy class. It is not M/R ready and is just the beginning. > > - src/main/java/org/apache/mahout/clustering/canopy/Canopy.java: new > class > (constructor): create a new Canopy with the given point > (add): add another point to the Canopy > (toString): return a printable representation > (ptOut): return with the point's information represented > > - src/main/java/org/apache/mahout/clustering/canopy/ > DistanceMeasure.java: new interface > (distance): single method returns the distance metric > > - src/main/java/org/apache/mahout/clustering/canopy/ > ManhattanDistancemeasure.java: new class > (distance): single method returns the Manhattan distance metric > > - src/main/java/org/apache/mahout/clustering/canopy/ > EuclidianDistanceMeasure.java: new class > (distance): single method returns the Euclidian distance metric > > - src/test/java/org/apache/mahout/clustering/canopy/TestCanopy.java > (testOne, testTwo): new unit tests > (getPoints, makeCanopy, prtCanopies): utilities > >> Build initial canopy clustering prototype >> ----------------------------------------- >> >> Key: MAHOUT-3 >> URL: https://issues.apache.org/jira/browse/MAHOUT-3 >> Project: Mahout >> Issue Type: New Feature >> Reporter: Jeff Eastman >> Attachments: MAHOUT-3.diff >> >> >> I'd like to reserve some namespace, specifically >> org.apache.mahout.clustering.canopy to use for an initial prototype >> of canopy clustering. I'm going to start with a little unit test to >> get the basic algorithm sorted out, then M/R it. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > -------------------------- Grant Ingersoll http://lucene.grantingersoll.com http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
