One idea is BSP-based decision tree classification project.

> it seems that if the outgoing queue is large on slaves then they will take
more time.

The asynchronous message sending mechanism can reduce that time. I
think this also can be a GSoC project. :-)



On Tue, Jan 19, 2016 at 5:24 PM, Behroz Sikander <behro...@gmail.com> wrote:
> Hi,
>
> *> Q1: Is Hama going to participate in GSOC 2016 ? *
> *Sure, why not?*
>
> -->Great. I am willing to participate in this GSOC. Do we already have some
> potential projects ? Jira does not seem to have any.
>
>
>
>
>
>
>
>
>
>
> *>> Q2: In the image below, I see an interesting behavior of Hama but I am
> not sure why the behavior is like this. Can you tell us what version you
> used? I roughly guess master task can receive incoming message bundles
> concurrently if number of tasks is large.*
> --> I am using 0.7.0.
> Ok but can a slave send concurrent message to master if the queue is
> large ? because
> it seems that if the outgoing queue is large on slaves then they will take
> more time.
>
> Regards,
> Behroz
>
> On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon <edward.y...@samsung.com>
> wrote:
>
>> > Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Sure, why not?
>>
>> > Q2: In the image below, I see an interesting behavior of Hama but I am
>> not
>> sure why the behavior is like this.
>>
>> Can you tell us what version you used?
>>
>> I roughly guess master task can receive incoming message bundles
>> concurrently
>> if number of tasks is large.
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>> -----Original Message-----
>> From: Behroz Sikander [mailto:bsikan...@apache.org]
>> Sent: Tuesday, January 19, 2016 12:28 AM
>> To: dev@hama.apache.org
>> Subject: Question regarding Hama synchronization behavior and GSOC
>>
>> Hi,
>> I have 2 questions regarding Hama.
>>
>> Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Q2: In the image below, I see an interesting behavior of Hama but I am not
>> sure why the behavior is like this.
>>
>> http://imgur.com/cVsfL1x
>>
>> On x-axis, I have the total number of data that I need to process. On
>> y-axis, I have the time in minutes which is aggregated over 200 iterations.
>> Each line in plot represent different number of Hama tasks (Peers) used to
>> process the data. Overall this plot is showing the *total time that master
>> task waits for slave tasks to synchronize (*for* 200 iterations *in*
>> minutes).*
>>
>> Note:
>> 1) total time master waits for slaves in *1* *iteration* = (time of slave
>> processing) +
>> *(time of synchronization)*
>> The plot is only showing the *time in synchronization* aggregated over *200
>> iterations*. I am using this plot to study the time taken by Hama in
>> synchronization.
>>
>> 2) The total data is divided among all the tasks equally. For example, if I
>> am using 10 tasks to process 10K data, then each task will get 1000. If i
>> use 20 tasks to process 10K, then each will have 500.
>>
>> Now in the plot for example, blue line represents 10 tasks. If I process
>> 10,000 files in 200 iterations the master waits for almost 3 minutes for
>> slaves to synchronize.
>>
>> Now if you look closely, then if I *increase* the *number of tasks* to
>> process the data, the *time* of master waiting for *slaves to
>> synchronization* starts to *decrease*. For example, look at the points on
>> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits
>> for only ~6 minutes and for 50 tasks, it took ~4mins.
>>
>> Q: My question is that how to interpret this information ?
>> The answer that I came up is that the *outgoing message queue* of tasks is
>> smaller in case I use more tasks to process and bigger in case I have less
>> tasks. For example, If a task has to send 1000 messages to master then its
>> outgoing queue will be bigger and will take more time to send as compared
>> to task with 500 outgoing messages. So, is my interpretation correct or
>> something else is going on here ?Any insight would be helpful.
>>
>> Regards,
>> Behroz
>>
>>
>>



-- 
Best Regards, Edward J. Yoon

Reply via email to