Looking at performance of one non-distributed implementation is
useful, but doesn't quite sound like a summer of work. In terms of
scope, my feeling is that a project would have to be broader, looking
for performance improvements across the most of the project.

Otherwise, in spirit, I think it sounds great. Taking a profiler and
Vaidya to the project will definitely turn up something interesting.

On Mon, Apr 11, 2011 at 4:04 AM, Federico Brubacher
<[email protected]> wrote:
> Hi Oliver and Sean,
>
> I'm in the process of rewriting my GSOC proposal , and stumbled into
> this thread, and I was wondering if it would be ok to work with you on
> the measurement, improvement of a specific part of Mahout recommender
> system. As I said in previous emails I'm intereseted in improving
> Mahouts KNN system. Oliver what do you think ? Also I will be
> travelling to Berlin in late May because I'm speaking at Euruko (a
> Ruby conference), we can meet then and touch base on the progress?
>
> Best,
>
> Federico
>
> On Sun, Apr 10, 2011 at 1:50 PM, Sean Owen <[email protected]> wrote:
>> I think it sounds like a great project.
>> I believe that one of the biggest barriers to improving performance is
>> simply understanding where the time is being spent. Is it I/O or CPU? is it
>> the combiner steps, shuffle? mapper, reducer?
>>
>> What you are suggesting, and what I am sort of thinking of, sounds a lot
>> like what Apache Vaidya is doing (
>> http://hadoop.apache.org/common/docs/r0.20.2/vaidya.html). This is a great
>> project and perhaps something to build on.
>>
>> It would be great to see the output of such a tool. I'm sure that it would
>> discover some clear, easy bottlenecks.
>>
>> On Sun, Apr 10, 2011 at 4:19 PM, Oliver Fischer 
>> <[email protected]>wrote:
>>
>>> Dear all,
>>>
>>> I would like to ask for your help and ideas.
>>>
>>> As I mentioned some days before, I will work within the next months on a
>>> performance test framework for Mahout. It will be called Thotti.
>>>
>>> Thotti shall be able to run arbitrary tests in a distributed environment
>>>  and support non-distributed and distributed algorithms. At the moment it is
>>> planned to utilize Amazon EC2 for distributed test execution. Thotti will
>>> also be able to generate reports on the test execution.
>>>
>>> Since Thotti should be community framework I need your help. Please let me
>>> know your expectation on a framework as Thotti.
>>>
>>> Best Regards,
>>>
>>> Oliver
>>>
>>> --
>>> Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin
>>> Certified ScrumMaster, OMG Certified Expert in BPM - Fundamental
>>> Tel. +49 30 44793251, Mobil: +49 178 7903538
>>> Mail: [email protected]
>>> Blog: http://logbuch.freiheitsgrade-se.de
>>>
>>
>
>
>
> --
> Federico Brubacher
> @fbru02
>

Reply via email to