[ https://issues.apache.org/jira/browse/HADOOP-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662458#action_12662458 ]
Doug Cutting commented on HADOOP-4586: -------------------------------------- > Zookeeper might not work well for maintaining JobTracker state (or for that > matter, Namenode persistent state) because these processes have lots of > metadata to store. That's the key concern. Zookeeper's in-memory datastructures would probably take much more space than those in the namenode and/or jobtracker do today. Other than that, Zookeeper seems ideally suited to these tasks. Perhaps if Zookeeper were to support namespace partitioning and rebalancing (hard problems) then it could be used to store such data. It would certainly vastly simplify many things. > Fault tolerant Hadoop Job Tracker > --------------------------------- > > Key: HADOOP-4586 > URL: https://issues.apache.org/jira/browse/HADOOP-4586 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Environment: High availability enterprise system > Reporter: Francesco Salbaroli > Assignee: Francesco Salbaroli > Fix For: 0.21.0 > > Attachments: Enhancing the Hadoop MapReduce framework by adding > fault.ppt, FaultTolerantHadoop.pdf, HADOOP-4586-0.1.patch, jgroups-all.jar > > > The Hadoop framework has been designed, in an eort to enhance perfor- > mances, with a single JobTracker (master node). It's responsibilities varies > from managing job submission process, compute the input splits, schedule > the tasks to the slave nodes (TaskTrackers) and monitor their health. > In some environments, like the IBM and Google's Internet-scale com- > puting initiative, there is the need for high-availability, and performances > becomes a secondary issue. In this environments, having a system with > a Single Point of Failure (such as Hadoop's single JobTracker) is a major > concern. > My proposal is to provide a redundant version of Hadoop by adding > support for multiple replicated JobTrackers. This design can be approached > in many dierent ways. > In the document at: > http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0 > I wrote an overview of the problem and some approaches to solve it. > I post this to the community to gather feedback on the best way to proceed in > my work. > Thank you! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.