Hi, All, We have been waiting for the comments for a couple of days. We have no idea how to move on to the next step. Can anyone advice?
If this idea is not good enough, what else can we do to contribute to this community? Regards, Yexi 2013/5/3 yu lee <[email protected]> > Co-ask. > > Shannon: we'd be happy if you are going to help us! > > Ted: what do you think about our (Yexi's and my) ideas? Shall we move on to > the proposal? > > > On Fri, May 3, 2013 at 8:10 AM, 姜页希 <[email protected]> wrote: > > > Is there other comments about this issue? > > > > > > > > 2013/5/2 Shannon Quinn <[email protected]> > > > > > This sounds excellent. I'd be happy to assist in unifying the > interfaces > > > of the spectral methods in particular. > > > > > > > > > On 5/2/13 3:54 PM, Yu Lee (JIRA) wrote: > > > > > >> [ https://issues.apache.org/**jira/browse/MAHOUT-1177?page=** > > >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** > > >> tabpanel&focusedCommentId=**13647841#comment-13647841< > > > https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647841#comment-13647841 > > >] > > >> > > >> Yu Lee commented on MAHOUT-1177: > > >> ------------------------------**-- > > >> > > >> Hello Robin Anil, Jeff Eastman, Dan Filimon, and Ted Dunning, > > >> > > >> Yexi and I (Yu Lee) are new to this Mahout community. We want to > > >> contribute to the improvement of Mahout by reforming and simplifying > the > > >> clustering APIs per the following link: > > >> https://issues.apache.org/**jira/browse/MAHOUT-1177?page=** > > >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** > > >> tabpanel&focusedCommentId=**13644120#comment-13644120< > > > https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644120#comment-13644120 > > > > > >> > > >> We have gone through the code of Mahout clustering. Now we have some > > >> ideas about improving it: > > >> > > >> ==============================**==============================** > > >> ============================= > > >> Addressing the problems in the current interface: > > >> > > >> Testing cases are missing. For example, in spectral kmeans clustering, > > >> the run methods of SpectralKmeansDriver and EigencutsDriver are not > > tested > > >> > > >> Documentations are missing for some methods. For example: in the run > > >> method of DirichletDriver, the description of parameter 'numModels' is > > >> missing; in the run method of SpectralKmeansDriver, the description of > > some > > >> arguments are missing > > >> > > >> Some testing methods do not contain the specific description of some > > >> arguments. For example: in the run method of FuzzyKmeansDriver, the > > >> description of an argument of "m" (fuzzification factor) is missing. > > >> Although a wiki link regarding "Clustering Analysis" is given, it is > not > > >> clear enough. > > >> > > >> ------------------------------**------------------------------** > > >> ----------------------------- > > >> > > >> Implementing some new clustering algorithms > > >> > > >> Agglomerative hierarchical clustering, which will cluster the data > > points > > >> into a dendragram, so that user could indicate whatever number of > > clusters > > >> as they want. (http://en.wikipedia.org/wiki/**Hierarchical_clustering > < > > http://en.wikipedia.org/wiki/Hierarchical_clustering> > > >> ) > > >> > > >> Dbscan, which is a density based clustering method being able to > > identify > > >> clusters with arbitrary shapes, and is useful in spatial clustering. ( > > >> http://en.wikipedia.org/wiki/**DBSCAN< > > http://en.wikipedia.org/wiki/DBSCAN> > > >> ) > > >> > > >> ------------------------------**------------------------------** > > >> ----------------------------- > > >> > > >> Providing a new unified interface > > >> > > >> Currently, each clustering algorithm has its own implemented class > with > > >> different interfaces (i.e., run methods in different Drivers have > > different > > >> argument list). However, it is better to have a unified interface to > > >> execute all available clustering methods, and an example interface is > as > > >> follows: > > >> > > >> Clustering-run(input, output, methodClass,clusteringConfig) > > >> > > >> Here, the "methodClass" indicates a specific clustering method, while > > >> "clusteringConfig" indicates the configuration for this specific > > clustering > > >> method. > > >> > > >> ==============================**==============================** > > >> ============================= > > >> > > >> Could you please let us know what you think about our ideas? > > >> > > >> > > >> > > >> > > >>> GSOC 2013: Reform and simplify the clustering APIs > > >>> ------------------------------**-------------------- > > >>> > > >>> Key: MAHOUT-1177 > > >>> URL: https://issues.apache.org/** > > >>> jira/browse/MAHOUT-1177< > > https://issues.apache.org/jira/browse/MAHOUT-1177> > > >>> Project: Mahout > > >>> Issue Type: Improvement > > >>> Reporter: Dan Filimon > > >>> Labels: gsoc2013, mentor > > >>> > > >>> Clustering is one of the most used features in Mahout and has many > > >>> applications [http://en.wikipedia.org/wiki/** > > >>> Cluster_analysis#Applications< > > http://en.wikipedia.org/wiki/Cluster_analysis#Applications> > > >>> ]**. > > >>> We have of lots clustering algorithms. There's: > > >>> - basic k-means > > >>> - canopy clustering > > >>> - Dirichlet clustering > > >>> - Fuzzy k-means > > >>> - Spectral k-means > > >>> - Streaming k-means [coming soon] > > >>> We want to make them easier to use by updating the APIs and make sure > > >>> they all work in the same way have consistent inputs, outputs, > > diagnostics > > >>> and documentation. > > >>> This is a great way to gain an in-depth understanding of clustering > > >>> algorithms, familiarize yourself with Hadoop, Mahout clustering and > > good > > >>> software engineering principles. > > >>> > > >> -- > > >> This message is automatically generated by JIRA. > > >> If you think it was sent incorrectly, please contact your JIRA > > >> administrators > > >> For more information on JIRA, see: http://www.atlassian.com/** > > >> software/jira <http://www.atlassian.com/software/jira> > > >> > > > > > > > > > > > > -- > > ------ > > Yexi Jiang, > > ECS 251, [email protected] > > School of Computer and Information Science, > > Florida International University > > Homepage: http://users.cis.fiu.edu/~yjian004/ > > > -- ------ Yexi Jiang, ECS 251, [email protected] School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
