Em, yes, you're right. It'll be meaningful when some task failed to launch, and also when failed to load checkpoint data from file system.
I was thought, it'll needless to checkpoint recovery. My mis-understand :) On Thu, Feb 2, 2012 at 4:51 PM, Thomas Jungblut <[email protected]> wrote: > Hi Edward, > > I would like to get into this fault-tolerance thing ASAP, we have to > include this in our next release. This is the argument to not include hama > in production environments. > In my opinion, yes we need these Attempts. Due to various reasons: > - input split is bound to a specific index, related to the sorting of the > task ids > - theres a mapping in zookeeper for host:port->taskid > > I want to tell you about the examples which use the master-client > architecture, which relies on the fact that the task's are sorted ascending. > If the mastertask fails, a reattempt won't break the ordering. Only the > host:port mapping must be updated in the zk and the other tasks have to > flush the caches and remap the znodes. > If you add a new task, you'll get a lot more pain than you actually want ;) > > Attemps are fine, or is there a specific problem you want to avoid? > > 2012/2/2 Edward J. Yoon <[email protected]> > >> Few Task-related classes e.g., TaskAttemptID .., etc. are copied from >> Hadoop MapReduce. >> >> Do you think we need to implement Task re-attempt mechanism? >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> > > > > -- > Thomas Jungblut > Berlin <[email protected]> -- Best Regards, Edward J. Yoon @eddieyoon
