[
https://issues.apache.org/jira/browse/HADOOP-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623860#comment-16623860
]
Eric Yang commented on HADOOP-15774:
------------------------------------
RegistryDNS is a tool to provide DNS resolution. If namenode hostname follows
certain convention, then it might be possible to provide multi-A record to
provide information about all namenodes. This is assuming that namdenode would
know how to find out about location of ZooKeeper servers via configuration file
to publish themselves. If ZooKeeper quorum can only be discovered by
configuration, then there is still no real service discovery.
It would be nice to move registryDNS from YARN to Hadoop common, but
dynamically spawned ZooKeeper server is broken currently. ZooKeeper needs to
be fixed in the following area to meet the requirements for service discovery
for Hadoop core system:
# ZooKeeper client session affinity. If ZooKeeper server spawned somewhere
else, existing client still try to connect to old IP. ZOOKEEPER-2929
# ZooKeeper Kerberos security fails, but continue to allow ZooKeeper
connections. ZOOKEEPER-1634
YARN usage of ZooKeeper and RegistryDNS are less prone to ZooKeeper defects
because assumption of ZooKeeper deployment is static, and curator handles some
odd conditions for ZooKeeper connections. Curator dependency might generate a
wave of realignment for HDFS dependency on ZooKeeper to get subpar results.
Alternative approach is to use multicast DNS for daemon process that supports
service discovery. Similar to Apple products that can find out about devices
in surrounding environment. There is a implementation of service discovery in
[this
project|https://github.com/macroadster/HMS/blob/master/beacon/src/main/java/org/apache/hms/beacon/Beacon.java]
for ZooKeeper. This was canned 6 years ago when [~aw] and others provided
feedback that said multicast DNS is too expensive and possible vulnerabilities.
Many issues were addressed by multicast DNS implementations in the last 6
years. Not sure if it is worth while to revisit multicast DNS for service
discovery purpose.
DNS or multicast DNS are most likely the best approach for providing discovery
of HA servers. However, the information are provided to be aware of potential
pitfalls in the current implementations.
> Discovery of HA servers
> -----------------------
>
> Key: HADOOP-15774
> URL: https://issues.apache.org/jira/browse/HADOOP-15774
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Íñigo Goiri
> Priority: Major
> Attachments: Discovery Service.pdf
>
>
> Currently, Hadoop relies on configuration files to specify the servers.
> This requires maintaining these configuration files and propagating the
> changes.
> Hadoop should have a framework to provide discovery.
> For example, in HDFS, we could define the Namenodes in a shared location and
> the DNs would use the framework to find the Namenodes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]