Question about fault tolerance and fail over for name nodes

Jason Venner Tue, 29 Jul 2008 09:01:55 -0700

What are people doing?

For jobs that have a long enough SLA, just shutting down the cluster andbringing up the secondary as the master works for us.We have some jobs where that doesn't work well, because the recoverytime is not acceptable.

There has been internal discussion of using drdb to hotfail a namenodeto a backup so that the running job can continue.

Question about fault tolerance and fail over for name nodes

Reply via email to