Hey Andrew,

I know not the answers to all your questions, but the
https://issues.apache.org/jira/browse/MAPREDUCE-2288 JIRA serves as a
good umbrella we can use to track this overall (there seems to have
been multiple approaches presented over time).

The closest I found to your rumor note was
https://issues.apache.org/jira/browse/MAPREDUCE-2648, but it lacks job
state maintenance (i.e. provides no resuming of jobs post failover). I
did not dig too deep, however.

On Sun, Jun 17, 2012 at 3:53 AM, Andrew Purtell <apurt...@apache.org> wrote:
> We are planning to run a next generation of Hadoop ecosystem components in
> our production in a few months. We plan to use HDFS 2.0 for the HA NameNode
> work. The platform will also include YARN but its use will be experimental.
> So we'll be running something equivalent to the CDH MR1 package to support
> production workloads for I'd guess a year.
>
> We have heard a rumor regarding the existence of a version of the MR1
> Jobtracker that persists state to Zookeeper such that failover to a new
> instance is fast and doesn't lose job state. I'd like to be aspirational and
> aim for a HA MR1 Jobtracker to complement the HA namenode. Even if no such
> existing code is available, we might adapt existing classes in the MR1
> Jobtracker to models/proxies of state in zookeeper. For clusters of our size
> (in the 100s of nodes range) this could be workable. Also, the MR client
> could possibly use ZK for failover like the HDFS client.
>
> I'm trying to find out first the availability of such code if anyone knows.
> Otherwise, we may try building this, and so also I'd like to get a sense of
> any interest in usage or dev collaboration.
>
> Best regards,
>
>     - Andy
>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Harsh J

Reply via email to