Re: Question regarding Hama synchronization behavior and GSOC

2016-01-19 Thread Edward J. Yoon
One idea is BSP-based decision tree classification project.

> it seems that if the outgoing queue is large on slaves then they will take
more time.

The asynchronous message sending mechanism can reduce that time. I
think this also can be a GSoC project. :-)



On Tue, Jan 19, 2016 at 5:24 PM, Behroz Sikander  wrote:
> Hi,
>
> *> Q1: Is Hama going to participate in GSOC 2016 ? *
> *Sure, why not?*
>
> -->Great. I am willing to participate in this GSOC. Do we already have some
> potential projects ? Jira does not seem to have any.
>
>
>
>
>
>
>
>
>
>
> *>> Q2: In the image below, I see an interesting behavior of Hama but I am
> not sure why the behavior is like this. Can you tell us what version you
> used? I roughly guess master task can receive incoming message bundles
> concurrently if number of tasks is large.*
> --> I am using 0.7.0.
> Ok but can a slave send concurrent message to master if the queue is
> large ? because
> it seems that if the outgoing queue is large on slaves then they will take
> more time.
>
> Regards,
> Behroz
>
> On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon 
> wrote:
>
>> > Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Sure, why not?
>>
>> > Q2: In the image below, I see an interesting behavior of Hama but I am
>> not
>> sure why the behavior is like this.
>>
>> Can you tell us what version you used?
>>
>> I roughly guess master task can receive incoming message bundles
>> concurrently
>> if number of tasks is large.
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>> -Original Message-
>> From: Behroz Sikander [mailto:bsikan...@apache.org]
>> Sent: Tuesday, January 19, 2016 12:28 AM
>> To: dev@hama.apache.org
>> Subject: Question regarding Hama synchronization behavior and GSOC
>>
>> Hi,
>> I have 2 questions regarding Hama.
>>
>> Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Q2: In the image below, I see an interesting behavior of Hama but I am not
>> sure why the behavior is like this.
>>
>> http://imgur.com/cVsfL1x
>>
>> On x-axis, I have the total number of data that I need to process. On
>> y-axis, I have the time in minutes which is aggregated over 200 iterations.
>> Each line in plot represent different number of Hama tasks (Peers) used to
>> process the data. Overall this plot is showing the *total time that master
>> task waits for slave tasks to synchronize (*for* 200 iterations *in*
>> minutes).*
>>
>> Note:
>> 1) total time master waits for slaves in *1* *iteration* = (time of slave
>> processing) +
>> *(time of synchronization)*
>> The plot is only showing the *time in synchronization* aggregated over *200
>> iterations*. I am using this plot to study the time taken by Hama in
>> synchronization.
>>
>> 2) The total data is divided among all the tasks equally. For example, if I
>> am using 10 tasks to process 10K data, then each task will get 1000. If i
>> use 20 tasks to process 10K, then each will have 500.
>>
>> Now in the plot for example, blue line represents 10 tasks. If I process
>> 10,000 files in 200 iterations the master waits for almost 3 minutes for
>> slaves to synchronize.
>>
>> Now if you look closely, then if I *increase* the *number of tasks* to
>> process the data, the *time* of master waiting for *slaves to
>> synchronization* starts to *decrease*. For example, look at the points on
>> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits
>> for only ~6 minutes and for 50 tasks, it took ~4mins.
>>
>> Q: My question is that how to interpret this information ?
>> The answer that I came up is that the *outgoing message queue* of tasks is
>> smaller in case I use more tasks to process and bigger in case I have less
>> tasks. For example, If a task has to send 1000 messages to master then its
>> outgoing queue will be bigger and will take more time to send as compared
>> to task with 500 outgoing messages. So, is my interpretation correct or
>> something else is going on here ?Any insight would be helpful.
>>
>> Regards,
>> Behroz
>>
>>
>>



-- 
Best Regards, Edward J. Yoon


Re: Question regarding Hama synchronization behavior and GSOC

2016-01-19 Thread Behroz Sikander
*>>One idea is BSP-based decision tree classification project.*
We have to build a package on Hama (just like Horn) that provides decision
tree based classification. right ? Sounds interesting.



*>>The asynchronous message sending mechanism can reduce that time. I think
this also can be a GSoC project. :-)*
So, currently Hama does not have asynchronous outgoing queues. Hmm this
might be the reason why it takes more time when we have more messages to
send. This GSoC project seems very interesting because it will give me a
good understanding of Hama itself.


Since GSoC is starting next month, I will start looking into what you have
proposed.
Thanks,
Behroz




On Tue, Jan 19, 2016 at 10:40 AM, Edward J. Yoon 
wrote:

> One idea is BSP-based decision tree classification project.
>
> > it seems that if the outgoing queue is large on slaves then they will
> take
> more time.
>
> The asynchronous message sending mechanism can reduce that time. I
> think this also can be a GSoC project. :-)
>
>
>
> On Tue, Jan 19, 2016 at 5:24 PM, Behroz Sikander 
> wrote:
> > Hi,
> >
> > *> Q1: Is Hama going to participate in GSOC 2016 ? *
> > *Sure, why not?*
> >
> > -->Great. I am willing to participate in this GSOC. Do we already have
> some
> > potential projects ? Jira does not seem to have any.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *>> Q2: In the image below, I see an interesting behavior of Hama but I
> am
> > not sure why the behavior is like this. Can you tell us what version you
> > used? I roughly guess master task can receive incoming message bundles
> > concurrently if number of tasks is large.*
> > --> I am using 0.7.0.
> > Ok but can a slave send concurrent message to master if the queue is
> > large ? because
> > it seems that if the outgoing queue is large on slaves then they will
> take
> > more time.
> >
> > Regards,
> > Behroz
> >
> > On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon  >
> > wrote:
> >
> >> > Q1: Is Hama going to participate in GSOC 2016 ?
> >>
> >> Sure, why not?
> >>
> >> > Q2: In the image below, I see an interesting behavior of Hama but I am
> >> not
> >> sure why the behavior is like this.
> >>
> >> Can you tell us what version you used?
> >>
> >> I roughly guess master task can receive incoming message bundles
> >> concurrently
> >> if number of tasks is large.
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
> >> -Original Message-
> >> From: Behroz Sikander [mailto:bsikan...@apache.org]
> >> Sent: Tuesday, January 19, 2016 12:28 AM
> >> To: dev@hama.apache.org
> >> Subject: Question regarding Hama synchronization behavior and GSOC
> >>
> >> Hi,
> >> I have 2 questions regarding Hama.
> >>
> >> Q1: Is Hama going to participate in GSOC 2016 ?
> >>
> >> Q2: In the image below, I see an interesting behavior of Hama but I am
> not
> >> sure why the behavior is like this.
> >>
> >> http://imgur.com/cVsfL1x
> >>
> >> On x-axis, I have the total number of data that I need to process. On
> >> y-axis, I have the time in minutes which is aggregated over 200
> iterations.
> >> Each line in plot represent different number of Hama tasks (Peers) used
> to
> >> process the data. Overall this plot is showing the *total time that
> master
> >> task waits for slave tasks to synchronize (*for* 200 iterations *in*
> >> minutes).*
> >>
> >> Note:
> >> 1) total time master waits for slaves in *1* *iteration* = (time of
> slave
> >> processing) +
> >> *(time of synchronization)*
> >> The plot is only showing the *time in synchronization* aggregated over
> *200
> >> iterations*. I am using this plot to study the time taken by Hama in
> >> synchronization.
> >>
> >> 2) The total data is divided among all the tasks equally. For example,
> if I
> >> am using 10 tasks to process 10K data, then each task will get 1000. If
> i
> >> use 20 tasks to process 10K, then each will have 500.
> >>
> >> Now in the plot for example, blue line represents 10 tasks. If I process
> >> 10,000 files in 200 iterations the master waits for almost 3 minutes for
> >> slaves to synchronize.
> >>
> >> Now if you look closely, then if I *increase* the *number of tasks* to
> >> process the data, the *time* of master waiting for *slaves to
> >> synchronization* starts to *decrease*. For example, look at the points
> on
> >> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it
> waits
> >> for only ~6 minutes and for 50 tasks, it took ~4mins.
> >>
> >> Q: My question is that how to interpret this information ?
> >> The answer that I came up is that the *outgoing message queue* of tasks
> is
> >> smaller in case I use more tasks to process and bigger in case I have
> less
> >> tasks. For example, If a task has to send 1000 messages to master then
> its
> >> outgoing queue will be bigger and will take more time to send as
> compared
> >> to task with 500 outgoing messages. So, is my interpretation correct or
> >> something else is going on