Re: Contributions to mahout

Ted Dunning Thu, 12 Aug 2010 23:02:34 -0700

On Thu, Aug 12, 2010 at 10:00 PM, Hotmail Email Address <[email protected]
> wrote:


>
> I joined this list a week or so ago and am looking to contribute to Mahout,
> I have studied ML in grad school


That is excellent.


>
> 1) assimilating a framework to introduce multiple layer or single layer
> neural nets to solve problems in image processing or computer vision
>

The Neuroph project are looking at ways to introduce their Neural Network
software into Mahout.  There will be significant amounts of effort required
there.

Also, the GSOC project that Zhao Zhendong worked on with SVM's will need to
have some documentation, testing and integration work.

For that matter, there is the question of the grand unification of all of
our clustering and classification code.  Thought on that score as well as
adaptation work would be of real interest.

 On a related note, however, there is very little in the way of methods for
deploying a classifier (either from supervised or unsupervised learning) as
a server.  We can do that with recommendations, but it would be really cool
if a classifier could be deployed as a recommendation engine.


> 2) genetic algorithms related to solving computationally demanding problems
>

We have some code in this area, but I am not particularly convinced that the
approaches are very scalable or efficient.  Very large scale projects tend
to focus on lean and mean algorithms and are typically of very high
dimension which both makes many genetic approaches very inefficient and
simpler approaches surprisingly effective.


>
> 3) experimenting with mahout on other data stores such as mongodb or rika
> or Cassandra
>

Not sure what you have in mind here although having a storybook available
with tales of "here's how you can read data from xyz" might be nice.
 Hopefully there is little difference no matter where the data comes from.

4) more thorough unit tests for some of the code using things like jbehave
>

More tests are ALWAYS welcome and we have a boatload of untested code in the
math module.  What happened there is that we did a mass import and
deprecation of the COLT package.  As we are finding uses for the code, we
are translating them to use our matrix package and adding tests.  If you
look at https://issues.apache.org/jira/browse/MAHOUT-469 you can see an
example of that.


> I am looking for recommendations from the community on the process to go
> about this, should I just start with the Jira tasks and assign myself some
> tasks pertaining to the above areas or start with number 4.
>

JIRA's tend to be filed when somebody has an itch that they are about to
scratch.  That means that there isn't so much of a backlog of work to be
done there ... if a JIRA sits around for a bit, it is, by definition, not
something that somebody is pushing for very hard.


> Also is there a project suggestions page for mahout similar to the one in
> hadoop, that would be a great idea for new folks to help.
>

There is such a beast, but it may not really reflect what is needed right
now.

This page might be some of what you are looking for:
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms



>
> Best Regards
>
> Sent from my iPad


Do you have a name?  Perhaps something better than "Hotmail Email Address"?

Re: Contributions to mahout

Reply via email to