It seems the NN was originally written with the assumption that disks fail and stuff happens. Hence the ability to have multiple directories store your NN data even though each directory is mostly likely redundant / HA.

[start rant]

My opinion is that it is a step backwards that the shared edits wasn't written with the same assumptions. If any one problem can take out your cluster then it isn't HA. So allowing a single nfs failure taking down your cluster and saying make nfs HA, just seems to move the HA problem not solve it. I would expect a true HA solution to be completely self contained within the hadoop ecosystem. All machines fail...eventually and it needs to be planned for. At a minimum a failure of the shared edits should only disable fail over and provide a recovery mechanism; Ideally the NN should have been rewritten to be a cluster (similar to zookeeper or ceph) to enable HA.

[end rant]

Sorry for the rant.  I just really want to see HDFS become complete HA system 
without caveats.

~Jeff

On 5/8/2012 11:44 AM, Todd Lipcon wrote:
On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook
<nathani...@qualtrics.com>  wrote:
We ran the initializeSharedEdits command and it didn't have any
effect, but that my be because of the weird state we got it in.

So help me understand: I was under the assumption that if shared edits
went away you would lose the ability to failover and that is it. The
active namenode would still function but would not failover and all
standy namenodes would not try to become active. Is this correct?
Unfortunately that's not the case. If you lose shared edits, your
cluster should shut down. We currently require the NFS direcory to be
highly available itself. This is achievable with even pretty
inexpensive NAS devices from your vendor of choice.

The reason for this behavior is as follows: if the active node loses
access to the mount, it's unable to distinguish whether the mount
itself died or if the node just had a local issue which broke the
mount. Imagine for example that the NFS client had a bug which caused
the mount to go away. Then, you'd continue running for quite some time
without writing to shared edits. If your NN then crashed, a failover
would cause you to revert to an old version of the namespace, and
you'd have a case of permanent data loss due to divergence of the
image before and after failover.

There's work under way to remove this restriction which should be
available for general use some time this summer or early fall, if I
had to take a guess on timeline.

If
it is the case that namenodes quit when they lose connection to the
shared edits dir than doesn't the shared edits becomes the new single
point of failure?
Yes, but it's an easy one to resolve. Most of our customers already
have a NAS device in their datacenter, which has dual heads, dual
PDUs, etc, and at least 5 9s of uptime. This HA setup is basically the
same as you see in most enterprise HA systems which rely on shared
storage.

Unfortunately we have cleared the logs from this test but we could try
to reproduce it.
That would be great, thanks!

-Todd

On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<t...@cloudera.com>  wrote:
On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook<nathani...@qualtrics.com>  wrote:
We have be working with an HA hdfs cluster, testing several failover
scenarios.  We have a small cluster of 4 machines spun up for testing.
We run a namenode on two of the machines and hosted an nfs share on
the third for the shared edits directory. The fourth machine is just a
datanode. We configured the cluster for automatic failover using ZKFC.
We can start and stop the namenodes with no problems, failover happens
as expected. Then we tested breaking the shared edits directory. We
stopped the nfs share and then reenabled it. This caused the loss of a
few edits.
Really? What mount options are you using on your NFS mount?

The active NN should abort immediately if the shared edits dir
disappears. Do you have logs available from your NNs during this time?

This had no effect, as expected, on the namenodes, and the
cluster functioned normally.
On the contrary, I'd expect the NN to bail out on the next edit (since
it has no place to reliably fsync it)

We stopped the standby namenode and tried
to start it again, it would not start because of the missing edits. No
matter what we tried we could not rebuild the shared edits directory
and thus get the second namenode back online. In this state the hdfs
cluster continued to function but it was no longer an HA cluster. To
get the cluster back in HA mode we had to reformat the namenode data
with the shared edits. In this case how do you rebuild the shared
edits data so you can get the cluster back to an HA mode?
It sounds like something went wrong with the facility that's supposed
to make the active NN crash if shared edits go away. The logs will
help.

To answer your question, though, you can run the
"initializeSharedEdits" process again to re-initialize that edits dir.

Thanks
-Todd
--
Todd Lipcon
Software Engineer, Cloudera


--
-Nathaniel Cook



--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com

Reply via email to