Thanks Paul. Sounds like that's the way to go then. We're just starting to experiment a bit with DRBD so we'll give that a shot and see how it works out.
On Tue, Jul 29, 2008 at 11:56 AM, paul <[EMAIL PROTECTED]> wrote: > I'm currently running with your option B setup and it seems to be reliable > for me (so far). I use a combination of drbd and various hearbeat/LinuxHA > scripts that handle the failover process, including a virtual IP for the > namenode. I haven't had any real-world unexpected failures to deal with, > yet, but all manual testing has had consistent and reliable results. > > > > -paul > > > On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote: > > > Dear Hadoop Community -- > > > > I am wondering if it is already possible or in the plans to add > capability > > for multiple master nodes. I'm in a situation where I have a master node > > that may potentially be in a less than ideal execution and networking > > environment. For this reason, it's possible that the master node could > die > > at any time. On the other hand, the application must always be available. > I > > have accessible to me other machines but I'm still unclear on the best > > method to add reliability. > > > > Here are a few options that I'm exploring: > > a) To create a completely secondary Hadoop cluster that we can flip to > when > > we detect that the master node has died. This will double hardware costs, > > so > > if we originally have a 5 node cluster, then we would need to pull 5 more > > machines out of somewhere for this decision. This is not the preferable > > choice. > > b) Just mirror the master node via other always available software, such > as > > DRBD for real time synchronization. Upon detection we could swap to the > > alternate node. > > c) Or if Hadoop had some functionality already in place, it would be > > fantastic to be able to take advantage of that. I don't know if anything > > like this is available but I could not find anything as of yet. It seems > to > > me, however, that having multiple master nodes would be the direction > > Hadoop > > needs to go if it is to be useful in high availability applications. I > was > > told there are some papers on Amazon's Elastic Computing that I'm about > to > > look for that follow this approach. > > > > In any case, could someone with experience in solving this type of > problem > > share how they approached this issue? > > > > Thanks! > > >
