One idea is BSP-based decision tree classification project. > it seems that if the outgoing queue is large on slaves then they will take more time.
The asynchronous message sending mechanism can reduce that time. I think this also can be a GSoC project. :-) On Tue, Jan 19, 2016 at 5:24 PM, Behroz Sikander <behro...@gmail.com> wrote: > Hi, > > *> Q1: Is Hama going to participate in GSOC 2016 ? * > *Sure, why not?* > > -->Great. I am willing to participate in this GSOC. Do we already have some > potential projects ? Jira does not seem to have any. > > > > > > > > > > > *>> Q2: In the image below, I see an interesting behavior of Hama but I am > not sure why the behavior is like this. Can you tell us what version you > used? I roughly guess master task can receive incoming message bundles > concurrently if number of tasks is large.* > --> I am using 0.7.0. > Ok but can a slave send concurrent message to master if the queue is > large ? because > it seems that if the outgoing queue is large on slaves then they will take > more time. > > Regards, > Behroz > > On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon <edward.y...@samsung.com> > wrote: > >> > Q1: Is Hama going to participate in GSOC 2016 ? >> >> Sure, why not? >> >> > Q2: In the image below, I see an interesting behavior of Hama but I am >> not >> sure why the behavior is like this. >> >> Can you tell us what version you used? >> >> I roughly guess master task can receive incoming message bundles >> concurrently >> if number of tasks is large. >> >> -- >> Best Regards, Edward J. Yoon >> >> -----Original Message----- >> From: Behroz Sikander [mailto:bsikan...@apache.org] >> Sent: Tuesday, January 19, 2016 12:28 AM >> To: dev@hama.apache.org >> Subject: Question regarding Hama synchronization behavior and GSOC >> >> Hi, >> I have 2 questions regarding Hama. >> >> Q1: Is Hama going to participate in GSOC 2016 ? >> >> Q2: In the image below, I see an interesting behavior of Hama but I am not >> sure why the behavior is like this. >> >> http://imgur.com/cVsfL1x >> >> On x-axis, I have the total number of data that I need to process. On >> y-axis, I have the time in minutes which is aggregated over 200 iterations. >> Each line in plot represent different number of Hama tasks (Peers) used to >> process the data. Overall this plot is showing the *total time that master >> task waits for slave tasks to synchronize (*for* 200 iterations *in* >> minutes).* >> >> Note: >> 1) total time master waits for slaves in *1* *iteration* = (time of slave >> processing) + >> *(time of synchronization)* >> The plot is only showing the *time in synchronization* aggregated over *200 >> iterations*. I am using this plot to study the time taken by Hama in >> synchronization. >> >> 2) The total data is divided among all the tasks equally. For example, if I >> am using 10 tasks to process 10K data, then each task will get 1000. If i >> use 20 tasks to process 10K, then each will have 500. >> >> Now in the plot for example, blue line represents 10 tasks. If I process >> 10,000 files in 200 iterations the master waits for almost 3 minutes for >> slaves to synchronize. >> >> Now if you look closely, then if I *increase* the *number of tasks* to >> process the data, the *time* of master waiting for *slaves to >> synchronization* starts to *decrease*. For example, look at the points on >> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits >> for only ~6 minutes and for 50 tasks, it took ~4mins. >> >> Q: My question is that how to interpret this information ? >> The answer that I came up is that the *outgoing message queue* of tasks is >> smaller in case I use more tasks to process and bigger in case I have less >> tasks. For example, If a task has to send 1000 messages to master then its >> outgoing queue will be bigger and will take more time to send as compared >> to task with 500 outgoing messages. So, is my interpretation correct or >> something else is going on here ?Any insight would be helpful. >> >> Regards, >> Behroz >> >> >> -- Best Regards, Edward J. Yoon