If I remember correctly, .. the framework change the job status as a "recovering" first, and then simply restart all the tasks from the last checkpoint. It works well but I only tested simple jobs (no input/output) on my cluster (see also HAMA-973).
To write perfect FT application from user side, every states in BSP program need to be written on the disk. So, some people discussed and introduced new Superstep API that provides more abstract interface like Pregel. On Mon, Feb 29, 2016 at 8:09 PM, Behroz Sikander <[email protected]> wrote: > Hi, > Just a quick question, is Hama fault tolerant ? What happens if a Hama > tasks fails ? > > Regards, > Behroz -- Best Regards, Edward J. Yoon
