Re: NameNode as a single point of failure

Konstantin Shvachko Mon, 06 Jul 2015 18:32:34 -0700

Hey Dmitry,

You understood correctly that QJM with automatic failover is the current
state of the art for HDFS.
With it we still have a single active NameNode on the cluster at any given
time, which does not solve the performance bottleneck problem.
I think active-active HA would have been an improvement for HDFS, even
though the idea did not win the popularity vote in the community.


If you are looking for a commercial solution I can talk to you about
WANdisco proprietary system off this list.
If you are looking for a development opportunity I can suggest looking at
our Giraffa project, which is designed to have both data and metadata
distributed and replicated:
https://github.com/GiraffaFS/giraffa

Thanks,
--Konstantin


On Thu, Jul 2, 2015 at 8:25 AM, Dmitry Salychev <[email protected]>
wrote:

> Hi, Esteban.
>
> Thanks for your reply. Thus, QJM automatic failover option is a cut-edge
> thing. Am I right?
>
> I think that it's a good idea to have truly equal NNs doing their work in
> parallel, as Konstantin Shvachko mentioned.
>
> On 07/02/2015 04:49 PM, Esteban Gutierrez wrote:
>
>> Hi Dmitry,
>>
>> Have you looked into the QJM automatic failover mode using the
>> ZKFailoverController?
>>
>> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Automatic_Failover
>> This is the most commonly used HA mode in production environments. Also
>> there is some recent work that will be in Hadoop 3 that will allow to have
>> more than 1 stand-by NNs: https://issues.apache.org/jira/browse/HDFS-6440
>>
>> cheers,
>> esteban.
>>
>>
>> --
>> Cloudera, Inc.
>>
>>
>> On Thu, Jul 2, 2015 at 7:42 AM, Dmitry Salychev <[email protected]>
>> wrote:
>>
>>  Sure, I did. It's actually not what I'm looking for. I don't want to
>>> spend
>>> time to make dead NN alive by my hands. There should be a solution for
>>> NN-SPOF problem.
>>>
>>>
>>> On 07/02/2015 04:36 PM, Vinayakumar B wrote:
>>>
>>>  Hi..
>>>> Did you look at the HDFS Namenode high availability?
>>>>
>>>> -Vinay
>>>> On Jul 2, 2015 11:50 AM, "Dmitry Salychev" <[email protected]>
>>>> wrote:
>>>>
>>>>   Hello, HDFS Developers.
>>>>
>>>>> I know that NN is a single point of failure of an entire HDFS cluster.
>>>>> If
>>>>> it fails, the cluster will be unavailable no matter how many DN there.
>>>>> I
>>>>> know that there is an initiative <
>>>>>
>>>>>
>>>>> http://www.wandisco.com/system/files/documentation/Meetup-ConsensusReplication.pdf
>>>>> which introduces ConsensusNode (as I can see it looks like distributed
>>>>> NN)
>>>>> and related issues (HDFS-6469 <
>>>>> https://issues.apache.org/jira/browse/HDFS-6469>, HADOOP-10641 <
>>>>> https://issues.apache.org/jira/browse/HADOOP-10641> and HDFS-7007 <
>>>>> https://issues.apache.org/jira/browse/HDFS-7007>). So, I'd like to
>>>>> ask.
>>>>>
>>>>> Has this NN-SPOF problem been solved? If it hasn't, can you show me an
>>>>> entry point where I can help to solve it?
>>>>>
>>>>> Thanks for your time.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>

Re: NameNode as a single point of failure

Reply via email to