On Mar 12, 2010, at 1:22 AM, Robin Anil wrote: > Shall I go and put some of the ideas up. I will do it as a whole for the > project. Later we can re-assign things maybe ? How does that sound? Unlike > other projects we cant really go an put a proposal like "Implement > back-propagation" and expect a student to take it up and reduce things to > map/reduce. > > Some of the ideas (i am going to be really ambitious/vague here, but write > clear expectations or guidelines on what is an ideal proposal) > > 1) Implement a cool classifier over map/reduce > 2) Implement a cool clustering algorithm on map/reduce > 3) Implement a meta-learner to plugin to various classifiers in mahout and > have bagging, boosting support. > 4) Continuous performance benchmarking/dashboard maybe wrappers over EC2 > 5) Create a matrix implementations of MYSQL and NOSQL(hbase, cassandra) > access for all the algorithms to use. > 6) Implement some of the ideas from Netflix top 5 to boost recommendations > packge > 7) Visualization tool for clustering, classification or recommendation. > ability to explain(optional) > 8) Improve mahout-math package
9. Implement M/R Tika integration to take "rich" documents on HDFS and output Vectors. Likely not a full Summer of Work there, but could be part of some larger "Utils" capabilities focused on making it easier to consume Mahout. Also included: Finish ARFF compatibility. 10. Benchmark. Break the record? I think we should still solicit ideas on list here that we can put up on JIRA. > > > Who is free to mentor this year? i.e giving 5-6 hours weekly to a student > and hear then crib(sorry ian and isabel :P) and give words of encouragement. > And yes, code reviews. I'm in.