On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting <je...@qualtrics.com> wrote: > It seems the NN was originally written with the assumption that disks fail > and stuff happens. Hence the ability to have multiple directories store > your NN data even though each directory is mostly likely redundant / HA. > > [start rant] > > My opinion is that it is a step backwards that the shared edits wasn't > written with the same assumptions. If any one problem can take out your > cluster then it isn't HA. So allowing a single nfs failure taking down > your cluster and saying make nfs HA, just seems to move the HA problem not > solve it. I would expect a true HA solution to be completely self contained > within the hadoop ecosystem. All machines fail...eventually and it needs to > be planned for. At a minimum a failure of the shared edits should only > disable fail over and provide a recovery mechanism; Ideally the NN should > have been rewritten to be a cluster (similar to zookeeper or ceph) to enable > HA. > > [end rant]
Like I said earlier in the thread, work is already under way on this and should be complete within a number of months. In many practical deployments, what we have already can provide complete HA. In others, like the AWS example you mentioned, we need a bit more, and we're working on it. Hang on a bit longer and it will be good to go. -Todd > > Sorry for the rant. I just really want to see HDFS become complete HA > system without caveats. > > ~Jeff > > > On 5/8/2012 11:44 AM, Todd Lipcon wrote: >> >> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >> <nathani...@qualtrics.com> wrote: >>> >>> We ran the initializeSharedEdits command and it didn't have any >>> effect, but that my be because of the weird state we got it in. >>> >>> So help me understand: I was under the assumption that if shared edits >>> went away you would lose the ability to failover and that is it. The >>> active namenode would still function but would not failover and all >>> standy namenodes would not try to become active. Is this correct? >> >> Unfortunately that's not the case. If you lose shared edits, your >> cluster should shut down. We currently require the NFS direcory to be >> highly available itself. This is achievable with even pretty >> inexpensive NAS devices from your vendor of choice. >> >> The reason for this behavior is as follows: if the active node loses >> access to the mount, it's unable to distinguish whether the mount >> itself died or if the node just had a local issue which broke the >> mount. Imagine for example that the NFS client had a bug which caused >> the mount to go away. Then, you'd continue running for quite some time >> without writing to shared edits. If your NN then crashed, a failover >> would cause you to revert to an old version of the namespace, and >> you'd have a case of permanent data loss due to divergence of the >> image before and after failover. >> >> There's work under way to remove this restriction which should be >> available for general use some time this summer or early fall, if I >> had to take a guess on timeline. >> >>> If >>> it is the case that namenodes quit when they lose connection to the >>> shared edits dir than doesn't the shared edits becomes the new single >>> point of failure? >> >> Yes, but it's an easy one to resolve. Most of our customers already >> have a NAS device in their datacenter, which has dual heads, dual >> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the >> same as you see in most enterprise HA systems which rely on shared >> storage. >> >>> Unfortunately we have cleared the logs from this test but we could try >>> to reproduce it. >> >> That would be great, thanks! >> >> -Todd >> >>> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<t...@cloudera.com> wrote: >>>> >>>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook<nathani...@qualtrics.com> >>>> wrote: >>>>> >>>>> We have be working with an HA hdfs cluster, testing several failover >>>>> scenarios. We have a small cluster of 4 machines spun up for testing. >>>>> We run a namenode on two of the machines and hosted an nfs share on >>>>> the third for the shared edits directory. The fourth machine is just a >>>>> datanode. We configured the cluster for automatic failover using ZKFC. >>>>> We can start and stop the namenodes with no problems, failover happens >>>>> as expected. Then we tested breaking the shared edits directory. We >>>>> stopped the nfs share and then reenabled it. This caused the loss of a >>>>> few edits. >>>> >>>> Really? What mount options are you using on your NFS mount? >>>> >>>> The active NN should abort immediately if the shared edits dir >>>> disappears. Do you have logs available from your NNs during this time? >>>> >>>>> This had no effect, as expected, on the namenodes, and the >>>>> cluster functioned normally. >>>> >>>> On the contrary, I'd expect the NN to bail out on the next edit (since >>>> it has no place to reliably fsync it) >>>> >>>>> We stopped the standby namenode and tried >>>>> to start it again, it would not start because of the missing edits. No >>>>> matter what we tried we could not rebuild the shared edits directory >>>>> and thus get the second namenode back online. In this state the hdfs >>>>> cluster continued to function but it was no longer an HA cluster. To >>>>> get the cluster back in HA mode we had to reformat the namenode data >>>>> with the shared edits. In this case how do you rebuild the shared >>>>> edits data so you can get the cluster back to an HA mode? >>>> >>>> It sounds like something went wrong with the facility that's supposed >>>> to make the active NN crash if shared edits go away. The logs will >>>> help. >>>> >>>> To answer your question, though, you can run the >>>> "initializeSharedEdits" process again to re-initialize that edits dir. >>>> >>>> Thanks >>>> -Todd >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>> >>> >>> >>> -- >>> -Nathaniel Cook >> >> >> > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > je...@qualtrics.com > -- Todd Lipcon Software Engineer, Cloudera