Hi Jeff, I don't know the HP offerings very well myself, but I know some of our customers are successfully using lower end NetApp devices.
You should also be aware that work on the NAS-less shared storage is well under way: HDFS-3077. So if your timeline is more than a few months out to production, you may consider waiting for it to get your HA setup running. -Todd On Tue, Jul 24, 2012 at 12:05 PM, Jeff Whiting <je...@qualtrics.com> wrote: > Todd or anyone who knows, > > I'm reviving an old thread because we are collocating into a data center > rather than just using the cloud. You mentioned "We currently require the > NFS direcory to be highly available itself. This is achievable with even > pretty inexpensive NAS devices from your vendor of choice." What hardware > would you suggest that would give us an HA filer? Specifically we are going > all HP in the colo. > > I've looked around and was unable to find any suggestions. The docs just > say "high-quality dedicated NAS appliance." Any suggestions would be great! > > https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration > http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/ > http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419 > > Thanks, > ~Jeff > > > On 5/8/2012 6:49 PM, Todd Lipcon wrote: >> >> Hi Jeff, >> >> Check out HDFS-3077. We'll probably need the most help when it comes >> time to do testing. Any testing you can do on the current HA solution, >> non-ideal as it may be, is also immensely valuable. For example, if >> you can reproduce the case where it didn't exit upon loss of shared >> edits, that would also be a bug which would hit the quorum-based >> solution. >> >> Thanks >> -Todd >> >> On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <je...@qualtrics.com> wrote: >>> >>> Thanks for being patient and listening to my rants. I'm excited to see >>> hdfs >>> continue to move forward. If the organization I'm working for was >>> willing >>> spend some resources to help speed this process up, where should be start >>> looking? I'm sure there are quite a few jiras on these issues. >>> >>> Most of what we've done with the hadoop eco system has been zookeeper and >>> hbase related. >>> >>> Thanks, >>> ~Jeff >>> >>> >>> On 5/8/2012 2:46 PM, Todd Lipcon wrote: >>>> >>>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<je...@qualtrics.com> >>>> wrote: >>>>> >>>>> It seems the NN was originally written with the assumption that disks >>>>> fail >>>>> and stuff happens. Hence the ability to have multiple directories >>>>> store >>>>> your NN data even though each directory is mostly likely redundant / >>>>> HA. >>>>> >>>>> [start rant] >>>>> >>>>> My opinion is that it is a step backwards that the shared edits wasn't >>>>> written with the same assumptions. If any one problem can take out >>>>> your >>>>> cluster then it isn't HA. So allowing a single nfs failure taking >>>>> down >>>>> your cluster and saying make nfs HA, just seems to move the HA problem >>>>> not >>>>> solve it. I would expect a true HA solution to be completely self >>>>> contained >>>>> within the hadoop ecosystem. All machines fail...eventually and it >>>>> needs >>>>> to >>>>> be planned for. At a minimum a failure of the shared edits should only >>>>> disable fail over and provide a recovery mechanism; Ideally the NN >>>>> should >>>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to >>>>> enable >>>>> HA. >>>>> >>>>> [end rant] >>>> >>>> Like I said earlier in the thread, work is already under way on this >>>> and should be complete within a number of months. >>>> >>>> In many practical deployments, what we have already can provide >>>> complete HA. In others, like the AWS example you mentioned, we need a >>>> bit more, and we're working on it. Hang on a bit longer and it will be >>>> good to go. >>>> >>>> -Todd >>>> >>>>> Sorry for the rant. I just really want to see HDFS become complete HA >>>>> system without caveats. >>>>> >>>>> ~Jeff >>>>> >>>>> >>>>> On 5/8/2012 11:44 AM, Todd Lipcon wrote: >>>>>> >>>>>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook >>>>>> <nathani...@qualtrics.com> wrote: >>>>>>> >>>>>>> We ran the initializeSharedEdits command and it didn't have any >>>>>>> effect, but that my be because of the weird state we got it in. >>>>>>> >>>>>>> So help me understand: I was under the assumption that if shared >>>>>>> edits >>>>>>> went away you would lose the ability to failover and that is it. The >>>>>>> active namenode would still function but would not failover and all >>>>>>> standy namenodes would not try to become active. Is this correct? >>>>>> >>>>>> Unfortunately that's not the case. If you lose shared edits, your >>>>>> cluster should shut down. We currently require the NFS direcory to be >>>>>> highly available itself. This is achievable with even pretty >>>>>> inexpensive NAS devices from your vendor of choice. >>>>>> >>>>>> The reason for this behavior is as follows: if the active node loses >>>>>> access to the mount, it's unable to distinguish whether the mount >>>>>> itself died or if the node just had a local issue which broke the >>>>>> mount. Imagine for example that the NFS client had a bug which caused >>>>>> the mount to go away. Then, you'd continue running for quite some time >>>>>> without writing to shared edits. If your NN then crashed, a failover >>>>>> would cause you to revert to an old version of the namespace, and >>>>>> you'd have a case of permanent data loss due to divergence of the >>>>>> image before and after failover. >>>>>> >>>>>> There's work under way to remove this restriction which should be >>>>>> available for general use some time this summer or early fall, if I >>>>>> had to take a guess on timeline. >>>>>> >>>>>>> If >>>>>>> it is the case that namenodes quit when they lose connection to the >>>>>>> shared edits dir than doesn't the shared edits becomes the new single >>>>>>> point of failure? >>>>>> >>>>>> Yes, but it's an easy one to resolve. Most of our customers already >>>>>> have a NAS device in their datacenter, which has dual heads, dual >>>>>> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the >>>>>> same as you see in most enterprise HA systems which rely on shared >>>>>> storage. >>>>>> >>>>>>> Unfortunately we have cleared the logs from this test but we could >>>>>>> try >>>>>>> to reproduce it. >>>>>> >>>>>> That would be great, thanks! >>>>>> >>>>>> -Todd >>>>>> >>>>>>> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<t...@cloudera.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel >>>>>>>> Cook<nathani...@qualtrics.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> We have be working with an HA hdfs cluster, testing several >>>>>>>>> failover >>>>>>>>> scenarios. We have a small cluster of 4 machines spun up for >>>>>>>>> testing. >>>>>>>>> We run a namenode on two of the machines and hosted an nfs share on >>>>>>>>> the third for the shared edits directory. The fourth machine is >>>>>>>>> just >>>>>>>>> a >>>>>>>>> datanode. We configured the cluster for automatic failover using >>>>>>>>> ZKFC. >>>>>>>>> We can start and stop the namenodes with no problems, failover >>>>>>>>> happens >>>>>>>>> as expected. Then we tested breaking the shared edits directory. We >>>>>>>>> stopped the nfs share and then reenabled it. This caused the loss >>>>>>>>> of >>>>>>>>> a >>>>>>>>> few edits. >>>>>>>> >>>>>>>> Really? What mount options are you using on your NFS mount? >>>>>>>> >>>>>>>> The active NN should abort immediately if the shared edits dir >>>>>>>> disappears. Do you have logs available from your NNs during this >>>>>>>> time? >>>>>>>> >>>>>>>>> This had no effect, as expected, on the namenodes, and the >>>>>>>>> cluster functioned normally. >>>>>>>> >>>>>>>> On the contrary, I'd expect the NN to bail out on the next edit >>>>>>>> (since >>>>>>>> it has no place to reliably fsync it) >>>>>>>> >>>>>>>>> We stopped the standby namenode and tried >>>>>>>>> to start it again, it would not start because of the missing edits. >>>>>>>>> No >>>>>>>>> matter what we tried we could not rebuild the shared edits >>>>>>>>> directory >>>>>>>>> and thus get the second namenode back online. In this state the >>>>>>>>> hdfs >>>>>>>>> cluster continued to function but it was no longer an HA cluster. >>>>>>>>> To >>>>>>>>> get the cluster back in HA mode we had to reformat the namenode >>>>>>>>> data >>>>>>>>> with the shared edits. In this case how do you rebuild the shared >>>>>>>>> edits data so you can get the cluster back to an HA mode? >>>>>>>> >>>>>>>> It sounds like something went wrong with the facility that's >>>>>>>> supposed >>>>>>>> to make the active NN crash if shared edits go away. The logs will >>>>>>>> help. >>>>>>>> >>>>>>>> To answer your question, though, you can run the >>>>>>>> "initializeSharedEdits" process again to re-initialize that edits >>>>>>>> dir. >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Todd >>>>>>>> -- >>>>>>>> Todd Lipcon >>>>>>>> Software Engineer, Cloudera >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -Nathaniel Cook >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Jeff Whiting >>>>> Qualtrics Senior Software Engineer >>>>> je...@qualtrics.com >>>>> >>>> >>> -- >>> Jeff Whiting >>> Qualtrics Senior Software Engineer >>> je...@qualtrics.com >>> >> >> > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > je...@qualtrics.com > > > -- Todd Lipcon Software Engineer, Cloudera