[ https://issues.apache.org/jira/browse/HADOOP-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649067#action_12649067 ]
Francesco Salbaroli commented on HADOOP-4586: --------------------------------------------- What the community think about using the JGroups reliable multicast system for communicating and status monitoring between master and slaves? It has 2 major benefits: 1) Implements reliable multicast communications 2) Abstracts from the protocol used (can exploit benefits of Multicast UDP, where available, or using TCP where Multicast is forbidden i.e. Amazon EC2) Regards, Francesco > Fault tolerant Hadoop Job Tracker > --------------------------------- > > Key: HADOOP-4586 > URL: https://issues.apache.org/jira/browse/HADOOP-4586 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.18.0 > Environment: High availability enterprise system > Reporter: Francesco Salbaroli > Attachments: FaultTolerantHadoop.pdf > > Original Estimate: 2016h > Remaining Estimate: 2016h > > The Hadoop framework has been designed, in an eort to enhance perfor- > mances, with a single JobTracker (master node). It's responsibilities varies > from managing job submission process, compute the input splits, schedule > the tasks to the slave nodes (TaskTrackers) and monitor their health. > In some environments, like the IBM and Google's Internet-scale com- > puting initiative, there is the need for high-availability, and performances > becomes a secondary issue. In this environments, having a system with > a Single Point of Failure (such as Hadoop's single JobTracker) is a major > concern. > My proposal is to provide a redundant version of Hadoop by adding > support for multiple replicated JobTrackers. This design can be approached > in many dierent ways. > In the document at: > http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0 > I wrote an overview of the problem and some approaches to solve it. > I post this to the community to gather feedback on the best way to proceed in > my work. > Thank you! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.