We are planning to run a next generation of Hadoop ecosystem components in our production in a few months. We plan to use HDFS 2.0 for the HA NameNode work. The platform will also include YARN but its use will be experimental. So we'll be running something equivalent to the CDH MR1 package to support production workloads for I'd guess a year.
We have heard a rumor regarding the existence of a version of the MR1 Jobtracker that persists state to Zookeeper such that failover to a new instance is fast and doesn't lose job state. I'd like to be aspirational and aim for a HA MR1 Jobtracker to compliment the HA namenode. Even if no such existing code is available, we might adapt existing classes in the MR1 Jobtracker to models/proxies of state in zookeeper. For clusters of our size (in the 100s of nodes range) this could be workable. Also, the MR client could possibly use ZK for failover like the HDFS client. I'm trying to find out first the availability of such code if anyone knows. Otherwise, we may try building this, and so also I'd like to get a sense of any interest in usage or dev collaboration. Best regards, - Andy -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)