Hey Jeff- I'm certainly willing to put some energy into developing implementations of these algorithms, and it's good to hear that you may be interested in guiding us in the right direction.
Here are the references I learned the algorithms from- some are more detailed than others: Mean-Shift clustering was introduced here and this paper is a thorough reference: Mean-Shift: A Robust Approach to Feature Space Analysis http://courses.csail.mit.edu/6.869/handouts/PAMIMeanshift.pdf And here's a PDF with just guts of the algorithm outlined: homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf It looks like there isn't a definitive reference for the k-means approximation with randomized k-d trees, but there are promising results introduced here: Object retrieval with large vocabularies and fast spatial matching: http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin07.pdf* * And a deeper explanation of the technique here: Randomized KD-Trees for Real-Time Keypoint Detection: ieeexplore.ieee.org/iel5/9901/31473/01467521.pdf?arnumber=1467521 Let me know what you think. Matt On Thu, Mar 6, 2008 at 11:45 AM, Jeff Eastman <[EMAIL PROTECTED]> wrote: > Hi Matthew, > > As with most open source projects, "interest" is mainly a function of > the willingness of somebody to contribute their energy. Clustering is > certainly within the scope of the project. I'd be interested in > exploring additional clustering algorithms with you and your colleague. > I'm a complete noob in this area and it is always enlightening to work > with students who have more current theoretical exposures. > > Do you have some links on these approaches that you find particularly > helpful? > > Jeff > > -----Original Message----- > From: Matthew Riley [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 05, 2008 11:11 PM > To: mahout-dev@lucene.apache.org; [EMAIL PROTECTED] > Subject: Re: Google Summer of Code > > Hey everyone- > > I've been watching the mailing list for a little while now, hoping to > contribute once I became more familiar, but I wanted to jump in here now > and > express my interest in the Summer of Code project. I'm currently a > graduate > student in electrical engineering at UT-Austin working in computer > vision, > which is closely tied to many of the problems Mahout is addressing > (especially in my area of content-based retrieval). > > What can I do to help out? > > I've discussed some potential Mahout projects with another student > recently- > mostly focused around approximate k-means algorithms (since that's a > problem > I've been working on lately). It sounds like you guys are already > implementing canopy clustering for k-means- Is there any interest in > developing another approximation algorithm based on randomized kd-trees > for > high dimensional data? What about mean-shift clustering? > > Again, I would be glad to help in any way I can. > > Matt > > On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost <[EMAIL PROTECTED]> > wrote: > > > On Saturday 01 March 2008, Grant Ingersoll wrote: > > > Also, any thoughts on what we might want someone to do? I think it > > > would be great to have someone implement one of the algorithms on > our > > > wiki. > > > > Just as a general note, the deadline for applications: > > > > March 12: Mentoring organization application deadline (12 noon > PDT/19:00 > > UTC). > > > > I suppose we should identify interesing tasks until that deadline. As > a > > general guideline for mentors and for project proposals: > > > > http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors > > > > Isabel > > > > -- > > Better late than never. -- Titus Livius (Livy) > > |\ _,,,---,,_ Web: <http://www.isabel-drost.de> > > /,`.-'`' -. ;-;;,_ > > |,4- ) )-,_..;\ ( `'-' > > '---''(_/--' `-'\_) (fL) IM: <xmpp://[EMAIL PROTECTED]> > > >