I think it sounds like a great project. I believe that one of the biggest barriers to improving performance is simply understanding where the time is being spent. Is it I/O or CPU? is it the combiner steps, shuffle? mapper, reducer?
What you are suggesting, and what I am sort of thinking of, sounds a lot like what Apache Vaidya is doing ( http://hadoop.apache.org/common/docs/r0.20.2/vaidya.html). This is a great project and perhaps something to build on. It would be great to see the output of such a tool. I'm sure that it would discover some clear, easy bottlenecks. On Sun, Apr 10, 2011 at 4:19 PM, Oliver Fischer <[email protected]>wrote: > Dear all, > > I would like to ask for your help and ideas. > > As I mentioned some days before, I will work within the next months on a > performance test framework for Mahout. It will be called Thotti. > > Thotti shall be able to run arbitrary tests in a distributed environment > and support non-distributed and distributed algorithms. At the moment it is > planned to utilize Amazon EC2 for distributed test execution. Thotti will > also be able to generate reports on the test execution. > > Since Thotti should be community framework I need your help. Please let me > know your expectation on a framework as Thotti. > > Best Regards, > > Oliver > > -- > Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin > Certified ScrumMaster, OMG Certified Expert in BPM - Fundamental > Tel. +49 30 44793251, Mobil: +49 178 7903538 > Mail: [email protected] > Blog: http://logbuch.freiheitsgrade-se.de >
