On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting <je...@qualtrics.com> wrote:
> It seems the NN was originally written with the assumption that disks fail
> and stuff happens.  Hence the ability to have multiple directories store
> your NN data even though each directory is mostly likely redundant / HA.
>
> [start rant]
>
> My opinion is that it is a step backwards that the shared edits wasn't
> written with the same assumptions.  If any one problem can take out your
> cluster then it isn't HA.  So allowing  a single nfs failure taking down
> your cluster and saying make nfs HA, just seems to move the HA problem not
> solve it.  I would expect a true HA solution to be completely self contained
> within the hadoop ecosystem.  All machines fail...eventually and it needs to
> be planned for.  At a minimum a failure of the shared edits should only
> disable fail over and provide a recovery mechanism; Ideally the NN should
> have been rewritten to be a cluster (similar to zookeeper or ceph) to enable
> HA.
>
> [end rant]

Like I said earlier in the thread, work is already under way on this
and should be complete within a number of months.

In many practical deployments, what we have already can provide
complete HA. In others, like the AWS example you mentioned, we need a
bit more, and we're working on it. Hang on a bit longer and it will be
good to go.

-Todd

>
> Sorry for the rant.  I just really want to see HDFS become complete HA
> system without caveats.
>
> ~Jeff
>
>
> On 5/8/2012 11:44 AM, Todd Lipcon wrote:
>>
>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook
>> <nathani...@qualtrics.com>  wrote:
>>>
>>> We ran the initializeSharedEdits command and it didn't have any
>>> effect, but that my be because of the weird state we got it in.
>>>
>>> So help me understand: I was under the assumption that if shared edits
>>> went away you would lose the ability to failover and that is it. The
>>> active namenode would still function but would not failover and all
>>> standy namenodes would not try to become active. Is this correct?
>>
>> Unfortunately that's not the case. If you lose shared edits, your
>> cluster should shut down. We currently require the NFS direcory to be
>> highly available itself. This is achievable with even pretty
>> inexpensive NAS devices from your vendor of choice.
>>
>> The reason for this behavior is as follows: if the active node loses
>> access to the mount, it's unable to distinguish whether the mount
>> itself died or if the node just had a local issue which broke the
>> mount. Imagine for example that the NFS client had a bug which caused
>> the mount to go away. Then, you'd continue running for quite some time
>> without writing to shared edits. If your NN then crashed, a failover
>> would cause you to revert to an old version of the namespace, and
>> you'd have a case of permanent data loss due to divergence of the
>> image before and after failover.
>>
>> There's work under way to remove this restriction which should be
>> available for general use some time this summer or early fall, if I
>> had to take a guess on timeline.
>>
>>> If
>>> it is the case that namenodes quit when they lose connection to the
>>> shared edits dir than doesn't the shared edits becomes the new single
>>> point of failure?
>>
>> Yes, but it's an easy one to resolve. Most of our customers already
>> have a NAS device in their datacenter, which has dual heads, dual
>> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the
>> same as you see in most enterprise HA systems which rely on shared
>> storage.
>>
>>> Unfortunately we have cleared the logs from this test but we could try
>>> to reproduce it.
>>
>> That would be great, thanks!
>>
>> -Todd
>>
>>> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<t...@cloudera.com>  wrote:
>>>>
>>>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook<nathani...@qualtrics.com>
>>>>  wrote:
>>>>>
>>>>> We have be working with an HA hdfs cluster, testing several failover
>>>>> scenarios.  We have a small cluster of 4 machines spun up for testing.
>>>>> We run a namenode on two of the machines and hosted an nfs share on
>>>>> the third for the shared edits directory. The fourth machine is just a
>>>>> datanode. We configured the cluster for automatic failover using ZKFC.
>>>>> We can start and stop the namenodes with no problems, failover happens
>>>>> as expected. Then we tested breaking the shared edits directory. We
>>>>> stopped the nfs share and then reenabled it. This caused the loss of a
>>>>> few edits.
>>>>
>>>> Really? What mount options are you using on your NFS mount?
>>>>
>>>> The active NN should abort immediately if the shared edits dir
>>>> disappears. Do you have logs available from your NNs during this time?
>>>>
>>>>> This had no effect, as expected, on the namenodes, and the
>>>>> cluster functioned normally.
>>>>
>>>> On the contrary, I'd expect the NN to bail out on the next edit (since
>>>> it has no place to reliably fsync it)
>>>>
>>>>> We stopped the standby namenode and tried
>>>>> to start it again, it would not start because of the missing edits. No
>>>>> matter what we tried we could not rebuild the shared edits directory
>>>>> and thus get the second namenode back online. In this state the hdfs
>>>>> cluster continued to function but it was no longer an HA cluster. To
>>>>> get the cluster back in HA mode we had to reformat the namenode data
>>>>> with the shared edits. In this case how do you rebuild the shared
>>>>> edits data so you can get the cluster back to an HA mode?
>>>>
>>>> It sounds like something went wrong with the facility that's supposed
>>>> to make the active NN crash if shared edits go away. The logs will
>>>> help.
>>>>
>>>> To answer your question, though, you can run the
>>>> "initializeSharedEdits" process again to re-initialize that edits dir.
>>>>
>>>> Thanks
>>>> -Todd
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>
>>>
>>>
>>> --
>>> -Nathaniel Cook
>>
>>
>>
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> je...@qualtrics.com
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to