[
https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643718#comment-13643718
]
Zhivko Lazarov edited comment on MAHOUT-1177 at 4/27/13 6:06 PM:
-----------------------------------------------------------------
Hello,
I am an undergraduate student who is very interested in this project. I am
already familiar with clustering algorithms, since I've take classes like Data
Mining, Pattern Recognition, Machine Learning, Artificial Intelligence and the
closest to this project is a news aggregation project I've done where I was
required to implement different clustering algorithms K-Means, Hierarchical
Clustering on which I had some own improvements for better accuracy. The news
were aggregated into a NoSQL MongoDB database. I have competed on ACM ICPC 2011
and 2012 and have been a laboratory assistant on Algorithms and Data Structures
at my faculty. I would like to be involved in this project but as part of GSOC
if it is possible. Regards, Zivko.
was (Author: lazzrov):
Hello,
I am an undergraduate student who is very interested in this project. I am
already familiar with clustering algorithms, since I've take classes like Data
Mining, Pattern Recognition, Machine Learning, Artificial Intelligence and the
closest to this project is a news aggregation project I've done where I was
required to implement different clustering algorithms K-Means, Hierarchical
Clustering on which I had some own improvements given the aggregated news
database(NoSQL MongoDB). I have competed on ACM ICPC 2011 and 2012 and have
been a laboratory assistant on Algorithms and Data Structures at my faculty. I
would like to be involved in this project but as part of GSOC if it is
possible. Regards, Zivko.
> GSOC 2013: Reform and simplify the clustering APIs
> --------------------------------------------------
>
> Key: MAHOUT-1177
> URL: https://issues.apache.org/jira/browse/MAHOUT-1177
> Project: Mahout
> Issue Type: Improvement
> Reporter: Dan Filimon
> Labels: gsoc2013, mentor
>
> Clustering is one of the most used features in Mahout and has many
> applications [http://en.wikipedia.org/wiki/Cluster_analysis#Applications].
> We have of lots clustering algorithms. There's:
> - basic k-means
> - canopy clustering
> - Dirichlet clustering
> - Fuzzy k-means
> - Spectral k-means
> - Streaming k-means [coming soon]
> We want to make them easier to use by updating the APIs and make sure they
> all work in the same way have consistent inputs, outputs, diagnostics and
> documentation.
> This is a great way to gain an in-depth understanding of clustering
> algorithms, familiarize yourself with Hadoop, Mahout clustering and good
> software engineering principles.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira