Review Request : I've submitted a very generalized proposal to ASF. Is there some way I can confirm that it has been channeled and delivered to mahout?
The proposal is as following. Any advice is appreciated. (Perhaps i should have provided a link instead ? ) *Short description:* Main goal of this project is to refactor for performace/ease-of-use based on Mahout API design decided by community. Additionally provide for info-graphic based documentation. Add/Redesign :test , examples, benchmarks. *Problem Description* There is a need for restructuring the Mahout API to provide streamlined input and output formats, and an intuitive structuring of the class and project hierarchy. Several related projects may need a common interface with regularized prototypes. We need to redesign several tests, benchmarks and examples; and to add these in case they are not present. *Deliverables* - Clean and optimized API - Documentation with info-graphics and dependency charts. - New tests and benchmarks *Design Document* The design of the new API is expected to come up well before the coding phase starts based on ongoing discussions in mailing-lists. I will set up a wiki that allows easy access and exchange of opinions on design. I am a huge believer in info-graphics and will include the design graphic and dependency graph on the Mahout documentation. *Approach* Largely IDE based development with help of integrated tools.Intent to resort to CLI for writing and editing scripts. *Timeline* The summer break is on, so I am essentially free till the mid of July. So, I have a lot of time on my hands that I can devote to my project. I can commit to over 40 hours every week. Regular classes resume thereafter (which are no hindrance). *Pre-Coding Phase:<3 weeks : 5 May - 26 May >* Address few PMD, Find Bugs, Check Style, Open Tasks on Jenkins to gain familiarity with the code-base and associated tools.Meanwhile, create the re-factoring road-map based on open discussion in mahout community. *Phase 1:**<3 weeks : 27 May - 16 June >* Restructure the code-base to the new API design .Provide Regression testing and redesign tests when required. *Review** 1: **<1 week : 17 June - 23 June >* Update the Mahout wiki and the Documentation .Provide and run Diagnostics.Also document the tests and examples for the beginners.Profiler report analysis, look for bottle-necks. *Phase 2: **<2 weeks : 24 June - **7** July >* Write tests and examples and benchmarks .Address community feedback on the work in Phase 1. *Review** **2**: **<1 week : **8** Ju**ly** - **14** Ju**ly** >* End-to-end testing.Fix outstanding bugs.Report on performance improvement. *Phase 3:<4 weeks : 15 July - 12 July >* I hope to have built a very good foundation by now.Re-commence Integrated development with concurrent testing and documentation.Resolve related JIRA issues. *Beyond GSOC:* Remain associated with Mahout.Work towards becoming a commiter. Due to the nature of the project; the timeline maybe subject to changes, to reflect the variations in the roadmap.I am also open to tasks that my mentor may see fit to assign me. *References:* https://issues.apache.org/jira/browse/MAHOUT-1177 https://issues.apache.org/jira/browse/MAHOUT-1179 *About Me* I am an under-graduate student about to start the final year of the 4-year-programme for Computer Science and Engineering, at Birla Institute Of Technology, Mesra,India . I have a proper background in statistics , object-oriented programming, and system architecture. I endeavor to build a career in scalable data science. I have developed a preliminary understanding of Hadoop and Mahout API's and hope to build upon the knowledge as we progress along GSOC. This is my first experience with Open-source and I will surely give my best. On Fri, May 3, 2013 at 5:59 AM, satyam sinha <[email protected]> wrote: > I lost a lot of time due to semester-evaluations at my institute.(I should > have notified perhaps.) > The summer break is begun and now I have uninterrupted time to devote to > GSOC. > > I have already setup hadoop-1.0.4 on opensuse-12.3. > Mahout 0.8-SNAPSHOT via svn on netbeans-7.3 > I've been running various examples and tests included. > It took me almost a week( Okay I'm not a wizard !! :) ) to setup and go > through various talks and slides. > I need some insight whether it is advisable to look into Avro now. > > TL;DR > I was away for college, but am back now full-time. > (May this not reflect badly upon me.) > Will setup the wiki with my initial ideas in under 24 hours, so that we > can all discuss needs of the API. > > > On Mon, Apr 8, 2013 at 12:23 PM, Isabel Drost-Fromm <[email protected]>wrote: > >> >> Hi Satyam, >> >> On Friday, April 05, 2013 09:54:56 PM satyam sinha wrote: >> > Please give directions and suggestions to help me on my very first FOSS >> > experience. >> >> I guess the best way to get started is to check out the source code, >> build the >> project and get familiar with the code. Both issues you mention in the >> subject >> a good for people with less experience in machine learning and/or Hadoop. >> >> However both are pretty involved - you will need to understand the >> existing >> code, come up with a good design for new APIs and discuss that design >> with the >> community. So best to concentrate on just one of them. >> >> Feel free to also create a separate wiki page that contains a living >> design >> document for the APIs that others can contribute to as well. >> >> >> Isabel >> > >
