I haven't read the papers, but the big question is do you think they can scale using M/R or some other distributed techniques?

If so, feel free to write up a bit of a proposal using the info at: http://wiki.apache.org/general/SummerOfCode2008 If you are unsure, that is fine too. We could start with a simpler implementation, and then look to distribute it.


On Mar 6, 2008, at 2:45 PM, Matthew Riley wrote:

Hey Jeff-

I'm certainly willing to put some energy into developing implementations of
these algorithms, and it's good to hear that you may be interested in
guiding us in the right direction.

Here are the references I learned the algorithms from- some are more
detailed than others:

Mean-Shift clustering was introduced here and this paper is a thorough
reference:
Mean-Shift: A Robust Approach to Feature Space Analysis
http://courses.csail.mit.edu/6.869/handouts/PAMIMeanshift.pdf

And here's a PDF with just guts of the algorithm outlined:
homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf

It looks like there isn't a definitive reference for the k-means
approximation with randomized k-d trees, but there are promising results
introduced here:

Object retrieval with large vocabularies and fast spatial matching:
http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin07.pdf*
*
And a deeper explanation of the technique here:

Randomized KD-Trees for Real-Time Keypoint Detection:
ieeexplore.ieee.org/iel5/9901/31473/01467521.pdf?arnumber=1467521

Let me know what you think.

Matt

On Thu, Mar 6, 2008 at 11:45 AM, Jeff Eastman <[EMAIL PROTECTED]> wrote:

Hi Matthew,

As with most open source projects, "interest" is mainly a function of
the willingness of somebody to contribute their energy. Clustering is
certainly within the scope of the project. I'd be interested in
exploring additional clustering algorithms with you and your colleague. I'm a complete noob in this area and it is always enlightening to work
with students who have more current theoretical exposures.

Do you have some links on these approaches that you find particularly
helpful?

Jeff

-----Original Message-----
From: Matthew Riley [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 05, 2008 11:11 PM
To: mahout-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Google Summer of Code

Hey everyone-

I've been watching the mailing list for a little while now, hoping to
contribute once I became more familiar, but I wanted to jump in here now
and
express my interest in the Summer of Code project. I'm currently a
graduate
student in electrical engineering at UT-Austin working in computer
vision,
which is closely tied to many of the problems Mahout is addressing
(especially in my area of content-based retrieval).

What can I do to help out?

I've discussed some potential Mahout projects with another student
recently-
mostly focused around approximate k-means algorithms (since that's a
problem
I've been working on lately). It sounds like you guys are already
implementing canopy clustering for k-means- Is there any interest in
developing another approximation algorithm based on randomized kd- trees
for
high dimensional data? What about mean-shift clustering?

Again, I would be glad to help in any way I can.

Matt

On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost <[EMAIL PROTECTED] drost.de>
wrote:

On Saturday 01 March 2008, Grant Ingersoll wrote:
Also, any thoughts on what we might want someone to do?  I think it
would be great to have someone implement one of the algorithms on
our
wiki.

Just as a general note, the deadline for applications:

March 12: Mentoring organization application deadline (12 noon
PDT/19:00
UTC).

I suppose we should identify interesing tasks until that deadline. As
a
general guideline for mentors and for project proposals:

http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors

Isabel

--
Better late than never.         -- Titus Livius (Livy)
 |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
/,`.-'`'    -.  ;-;;,_
|,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>



--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





Reply via email to