Looking at performance of one non-distributed implementation is useful, but doesn't quite sound like a summer of work. In terms of scope, my feeling is that a project would have to be broader, looking for performance improvements across the most of the project.
Otherwise, in spirit, I think it sounds great. Taking a profiler and Vaidya to the project will definitely turn up something interesting. On Mon, Apr 11, 2011 at 4:04 AM, Federico Brubacher <[email protected]> wrote: > Hi Oliver and Sean, > > I'm in the process of rewriting my GSOC proposal , and stumbled into > this thread, and I was wondering if it would be ok to work with you on > the measurement, improvement of a specific part of Mahout recommender > system. As I said in previous emails I'm intereseted in improving > Mahouts KNN system. Oliver what do you think ? Also I will be > travelling to Berlin in late May because I'm speaking at Euruko (a > Ruby conference), we can meet then and touch base on the progress? > > Best, > > Federico > > On Sun, Apr 10, 2011 at 1:50 PM, Sean Owen <[email protected]> wrote: >> I think it sounds like a great project. >> I believe that one of the biggest barriers to improving performance is >> simply understanding where the time is being spent. Is it I/O or CPU? is it >> the combiner steps, shuffle? mapper, reducer? >> >> What you are suggesting, and what I am sort of thinking of, sounds a lot >> like what Apache Vaidya is doing ( >> http://hadoop.apache.org/common/docs/r0.20.2/vaidya.html). This is a great >> project and perhaps something to build on. >> >> It would be great to see the output of such a tool. I'm sure that it would >> discover some clear, easy bottlenecks. >> >> On Sun, Apr 10, 2011 at 4:19 PM, Oliver Fischer >> <[email protected]>wrote: >> >>> Dear all, >>> >>> I would like to ask for your help and ideas. >>> >>> As I mentioned some days before, I will work within the next months on a >>> performance test framework for Mahout. It will be called Thotti. >>> >>> Thotti shall be able to run arbitrary tests in a distributed environment >>> and support non-distributed and distributed algorithms. At the moment it is >>> planned to utilize Amazon EC2 for distributed test execution. Thotti will >>> also be able to generate reports on the test execution. >>> >>> Since Thotti should be community framework I need your help. Please let me >>> know your expectation on a framework as Thotti. >>> >>> Best Regards, >>> >>> Oliver >>> >>> -- >>> Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin >>> Certified ScrumMaster, OMG Certified Expert in BPM - Fundamental >>> Tel. +49 30 44793251, Mobil: +49 178 7903538 >>> Mail: [email protected] >>> Blog: http://logbuch.freiheitsgrade-se.de >>> >> > > > > -- > Federico Brubacher > @fbru02 >
