On Thu, Aug 12, 2010 at 10:00 PM, Hotmail Email Address <[email protected] > wrote:
> > I joined this list a week or so ago and am looking to contribute to Mahout, > I have studied ML in grad school That is excellent. > > 1) assimilating a framework to introduce multiple layer or single layer > neural nets to solve problems in image processing or computer vision > The Neuroph project are looking at ways to introduce their Neural Network software into Mahout. There will be significant amounts of effort required there. Also, the GSOC project that Zhao Zhendong worked on with SVM's will need to have some documentation, testing and integration work. For that matter, there is the question of the grand unification of all of our clustering and classification code. Thought on that score as well as adaptation work would be of real interest. On a related note, however, there is very little in the way of methods for deploying a classifier (either from supervised or unsupervised learning) as a server. We can do that with recommendations, but it would be really cool if a classifier could be deployed as a recommendation engine. > 2) genetic algorithms related to solving computationally demanding problems > We have some code in this area, but I am not particularly convinced that the approaches are very scalable or efficient. Very large scale projects tend to focus on lean and mean algorithms and are typically of very high dimension which both makes many genetic approaches very inefficient and simpler approaches surprisingly effective. > > 3) experimenting with mahout on other data stores such as mongodb or rika > or Cassandra > Not sure what you have in mind here although having a storybook available with tales of "here's how you can read data from xyz" might be nice. Hopefully there is little difference no matter where the data comes from. 4) more thorough unit tests for some of the code using things like jbehave > More tests are ALWAYS welcome and we have a boatload of untested code in the math module. What happened there is that we did a mass import and deprecation of the COLT package. As we are finding uses for the code, we are translating them to use our matrix package and adding tests. If you look at https://issues.apache.org/jira/browse/MAHOUT-469 you can see an example of that. > I am looking for recommendations from the community on the process to go > about this, should I just start with the Jira tasks and assign myself some > tasks pertaining to the above areas or start with number 4. > JIRA's tend to be filed when somebody has an itch that they are about to scratch. That means that there isn't so much of a backlog of work to be done there ... if a JIRA sits around for a bit, it is, by definition, not something that somebody is pushing for very hard. > Also is there a project suggestions page for mahout similar to the one in > hadoop, that would be a great idea for new folks to help. > There is such a beast, but it may not really reflect what is needed right now. This page might be some of what you are looking for: https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > > Best Regards > > Sent from my iPad Do you have a name? Perhaps something better than "Hotmail Email Address"?
