Hey Dmitry, You understood correctly that QJM with automatic failover is the current state of the art for HDFS. With it we still have a single active NameNode on the cluster at any given time, which does not solve the performance bottleneck problem. I think active-active HA would have been an improvement for HDFS, even though the idea did not win the popularity vote in the community.
If you are looking for a commercial solution I can talk to you about WANdisco proprietary system off this list. If you are looking for a development opportunity I can suggest looking at our Giraffa project, which is designed to have both data and metadata distributed and replicated: https://github.com/GiraffaFS/giraffa Thanks, --Konstantin On Thu, Jul 2, 2015 at 8:25 AM, Dmitry Salychev <darkness....@gmail.com> wrote: > Hi, Esteban. > > Thanks for your reply. Thus, QJM automatic failover option is a cut-edge > thing. Am I right? > > I think that it's a good idea to have truly equal NNs doing their work in > parallel, as Konstantin Shvachko mentioned. > > On 07/02/2015 04:49 PM, Esteban Gutierrez wrote: > >> Hi Dmitry, >> >> Have you looked into the QJM automatic failover mode using the >> ZKFailoverController? >> >> https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Automatic_Failover >> This is the most commonly used HA mode in production environments. Also >> there is some recent work that will be in Hadoop 3 that will allow to have >> more than 1 stand-by NNs: https://issues.apache.org/jira/browse/HDFS-6440 >> >> cheers, >> esteban. >> >> >> -- >> Cloudera, Inc. >> >> >> On Thu, Jul 2, 2015 at 7:42 AM, Dmitry Salychev <darkness....@gmail.com> >> wrote: >> >> Sure, I did. It's actually not what I'm looking for. I don't want to >>> spend >>> time to make dead NN alive by my hands. There should be a solution for >>> NN-SPOF problem. >>> >>> >>> On 07/02/2015 04:36 PM, Vinayakumar B wrote: >>> >>> Hi.. >>>> Did you look at the HDFS Namenode high availability? >>>> >>>> -Vinay >>>> On Jul 2, 2015 11:50 AM, "Dmitry Salychev" <darkness....@gmail.com> >>>> wrote: >>>> >>>> Hello, HDFS Developers. >>>> >>>>> I know that NN is a single point of failure of an entire HDFS cluster. >>>>> If >>>>> it fails, the cluster will be unavailable no matter how many DN there. >>>>> I >>>>> know that there is an initiative < >>>>> >>>>> >>>>> http://www.wandisco.com/system/files/documentation/Meetup-ConsensusReplication.pdf >>>>> which introduces ConsensusNode (as I can see it looks like distributed >>>>> NN) >>>>> and related issues (HDFS-6469 < >>>>> https://issues.apache.org/jira/browse/HDFS-6469>, HADOOP-10641 < >>>>> https://issues.apache.org/jira/browse/HADOOP-10641> and HDFS-7007 < >>>>> https://issues.apache.org/jira/browse/HDFS-7007>). So, I'd like to >>>>> ask. >>>>> >>>>> Has this NN-SPOF problem been solved? If it hasn't, can you show me an >>>>> entry point where I can help to solve it? >>>>> >>>>> Thanks for your time. >>>>> >>>>> >>>>> >>>>> >>>>> >