[ 
https://issues.apache.org/jira/browse/HADOOP-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623860#comment-16623860
 ] 

Eric Yang commented on HADOOP-15774:
------------------------------------

RegistryDNS is a tool to provide DNS resolution.  If namenode hostname follows 
certain convention, then it might be possible to provide multi-A record to 
provide information about all namenodes.  This is assuming that namdenode would 
know how to find out about location of ZooKeeper servers via configuration file 
to publish themselves.  If ZooKeeper quorum can only be discovered by 
configuration, then there is still no real service discovery.

It would be nice to move registryDNS from YARN to Hadoop common, but 
dynamically spawned ZooKeeper server is broken currently.  ZooKeeper needs to 
be fixed in the following area to meet the requirements for service discovery 
for Hadoop core system:

# ZooKeeper client session affinity.  If ZooKeeper server spawned somewhere 
else, existing client still try to connect to old IP.  ZOOKEEPER-2929
# ZooKeeper Kerberos security fails, but continue to allow ZooKeeper 
connections.  ZOOKEEPER-1634

YARN usage of ZooKeeper and RegistryDNS are less prone to ZooKeeper defects 
because assumption of ZooKeeper deployment is static, and curator handles some 
odd conditions for ZooKeeper connections.  Curator dependency might generate a 
wave of realignment for HDFS dependency on ZooKeeper to get subpar results.

Alternative approach is to use multicast DNS for daemon process that supports 
service discovery.  Similar to Apple products that can find out about devices 
in surrounding environment.  There is a implementation of service discovery in 
[this 
project|https://github.com/macroadster/HMS/blob/master/beacon/src/main/java/org/apache/hms/beacon/Beacon.java]
 for ZooKeeper. This was canned 6 years ago when [~aw] and others provided 
feedback that said multicast DNS is too expensive and possible vulnerabilities. 
 Many issues were addressed by multicast DNS implementations in the last 6 
years.  Not sure if it is worth while to revisit multicast DNS for service 
discovery purpose.

DNS or multicast DNS are most likely the best approach for providing discovery 
of HA servers.  However, the information are provided to be aware of potential 
pitfalls in the current implementations.



> Discovery of HA servers
> -----------------------
>
>                 Key: HADOOP-15774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15774
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Íñigo Goiri
>            Priority: Major
>         Attachments: Discovery Service.pdf
>
>
> Currently, Hadoop relies on configuration files to specify the servers.
> This requires maintaining these configuration files and propagating the 
> changes.
> Hadoop should have a framework to provide discovery.
> For example, in HDFS, we could define the Namenodes in a shared location and 
> the DNs would use the framework to find the Namenodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to