Are you looking for this link? http://wiki.apache.org/hama/GroomServerFaultTolerance
>> There are many tasks required to work on and to be integrated in order >> to get (GroomServer) fault tolerance ready. Tasks include: >> - GroomServer status/ resource monitor >> - Failure Detection >> - Checkpointed data integration >> - Refactoring bsp() (if necessary) >> - Master decision making Hmm, yes. and I missed message compressor. Could you please split them into more smaller task so that we can help you? > I also would like to know why we rejected the idea of speculative task > execution? I wanted to talk about speculative task execution before but, the idea of speculative task execution is not discussed/reported yet. ( http://markmail.org/thread/sq7neayhstqufrsz ) To support this, we should add 'Progress' feature first. Currently, job/task progress checker is not implemented yet. > How serious is the feature of real-time processing for Hama? I am told that > some are already using it for the purpose and read Thomas's blog on the > same. Are we deferring it until we have a design for offline processing or > should we keep it in mind for fault tolerance? I think, yes if possible. But in some cases, maybe turning off recovery mode is the best. I don't understand perfectly yet, so would you please describe the issues which must be discussed/considered? On Tue, Feb 14, 2012 at 3:15 AM, Suraj Menon <[email protected]> wrote: > +1 on HAMA 511 should not be blocker. > > Also, I lost the wiki link that explains the fault tolerant design. It > would be helpful to undestand the recovery design. I believe that we will > have the recovery BSP tasks scheduled to start running(in high probability) > on node with data where the checkpointed messages are written on HDFS with > a single input split? > I also would like to know why we rejected the idea of speculative task > execution? > I am currently working on HAMA-445 and HAMA-498. Thanks to Chiahung, I have > 2-3 good papers to read already :). > > How serious is the feature of real-time processing for Hama? I am told that > some are already using it for the purpose and read Thomas's blog on the > same. Are we deferring it until we have a design for offline processing or > should we keep it in mind for fault tolerance? > > > Thanks, > Suraj > > > > On Mon, Feb 13, 2012 at 12:25 PM, Chia-Hung Lin <[email protected]>wrote: > >> There are many tasks required to work on and to be integrated in order >> to get (GroomServer) fault tolerance ready. Tasks include: >> - GroomServer status/ resource monitor >> - Failure Detection >> - Checkpointed data integration >> - Refactoring bsp() (if necessary) >> - Master decision making >> >> Currently I am working on the first one, and with a patch for 2nd on >> jira already. In my viewpoint, it might be difficult to get those >> tasks done within 2-3 months. >> >> On 13 February 2012 17:05, Edward J. Yoon <[email protected]> wrote: >> > Hi, >> > >> > I think, it's time to discuss about our 0.5 roadmap more clearly. >> > >> > IMO, I'd like to release Hama 0.5 with only fault tolerant processing, >> > clearly defined BSP and Pregel interfaces. Maybe 2~3 months later? >> > And, HAMA-511 should not be a blocker for 0.5 release, it should be >> > considered as a long term task I think. >> > >> > There's a lot of new M/R alternatives but no stable alternatives and >> > no dominant player at the moment. We have to stabilize ourselves first >> > rather than finding ways to differentiate ourselves from the >> > competition or considering new paradigms. >> > >> > Please feel free to leave your opinion! >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
