[ https://issues.apache.org/jira/browse/HADOOP-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655608#action_12655608 ]
Francesco Salbaroli commented on HADOOP-4586: --------------------------------------------- I will release a preliminary test version of Fault tolerant Hadoop before 17th Dec. Features will include: -JGroups 2.6.7 toolkit for reliable multicast communication that is based on a highly configurable protocol stack to adapt to different environments (I will post documentation about it). -Completely wraps around the Hadoop sourcecode to minimize modifications in the source tree. -Dynamic JobTracker address resolution using HDFS as a support. Enhancement in future versions: -Higher level of abstraction -Better exception handling I'll post the sourcecode at the beginning of the next week (hopefully). Can I be added to the developers? Best regards, Francesco > Fault tolerant Hadoop Job Tracker > --------------------------------- > > Key: HADOOP-4586 > URL: https://issues.apache.org/jira/browse/HADOOP-4586 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.18.0 > Environment: High availability enterprise system > Reporter: Francesco Salbaroli > Attachments: FaultTolerantHadoop.pdf > > Original Estimate: 2016h > Remaining Estimate: 2016h > > The Hadoop framework has been designed, in an eort to enhance perfor- > mances, with a single JobTracker (master node). It's responsibilities varies > from managing job submission process, compute the input splits, schedule > the tasks to the slave nodes (TaskTrackers) and monitor their health. > In some environments, like the IBM and Google's Internet-scale com- > puting initiative, there is the need for high-availability, and performances > becomes a secondary issue. In this environments, having a system with > a Single Point of Failure (such as Hadoop's single JobTracker) is a major > concern. > My proposal is to provide a redundant version of Hadoop by adding > support for multiple replicated JobTrackers. This design can be approached > in many dierent ways. > In the document at: > http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0 > I wrote an overview of the problem and some approaches to solve it. > I post this to the community to gather feedback on the best way to proceed in > my work. > Thank you! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.