Re: Mahout Bachelor's Project

Ted Dunning Fri, 12 Oct 2012 12:16:57 -0700

See http://github.com/tdunning/knn

The algorithms definitely need more work but what work they need is
something that needs more testing.

To get that testing mileage, we need to make those algorithms available in
a standard framework.

One thought that I have is that we should be able to build synthetic data
sets that emulate the clustering and search performance of realistic data.
 If we can avoid looking at anything but a few generalization scores, then
we have a very solid anonymization story because we won't even be
generating the same *types* of data in the random generator.  This alone
would be an interesting thesis topic.

Again, however, we need runtime from current clustering users to get the
scores.

On Fri, Oct 12, 2012 at 4:41 AM, Dan Filimon <[email protected]>wrote:

> > On my side:
> >
> > - I will provide mentor support for this project
> >
> > - I will help you write up the results by reviewing your write-ups and
> > suggesting structure and content.
> >
> > The benefits to you will be deep knowledge of advanced clustering
> > algorithms as well as practical experience in how integration like this
> can
> > happen.
>
> Could you explain a bit what working on the integration would entail?
>
> I don't want to sound ungrateful here, I definitely want to work with
> you, but ideally, I'd like to work *on* these advanced clustering
> algorithms (helping improve them maybe? overambitious?), not just
> integrate them.
>

Re: Mahout Bachelor's Project

Reply via email to