[
https://issues.apache.org/jira/browse/HADOOP-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331990#comment-14331990
]
Arun Suresh commented on HADOOP-11620:
--------------------------------------
[~lmccay], you make very valid comments. Let me try to address them
bq. .. why is it that we often put the burden of the loadbalancing on the
clients in Hadoop rather than put the servers behind a loadbalancing process or
VIP ?
I believe that having the clients manage the load balancing makes the system
easier to deploy, test and manage since there is one less component (the VIP)
to configure and handle. Also given that without a KMS, encrypted data cannot
be accessed, I feel that basic High availability should be part of the core
library and deployable without the need for external loadbalancers / VIPS.
Administrators can then decide to use a custom LB or the base client based on
specific deployment environment considerations.
bq.. .. makes it difficult to provide elastic provisioning of server instances
since all the clients config would need to be made aware of the changes.
I agree that my current implementation and yes, even the core RM and NN clients
do suffer from this. But this is something I feel can be fixed, considering the
fact that the Zookeeper Curator library is being used in hadoop-common, and
curator does come with a robust library for [Dynamic Service
Scaling|http://curator.apache.org/curator-x-discovery/]
I was infact planning on filing a follow-up JIRA for KMS (but yes, looks like
it might have more general applicability)
The reason I did not want to introduce dynamic scaling in this patch was that I
was more interested in High Availability, rather than improved read-scalability
where I need atleast 1 KMS up else encrypted data becomes unreadable which
implies that for current deployment scenarios, I was not expecting more than 2
or 3 KMS to participate in the loadbalancing group. Also the simple round-robin
load balancing would allow the caches in all participating KMSs to be warmed
over a period of time. Like I mentioned, I do plan to work on the dynamic
scaling once I hit the read-scalability wall.
> Add Support for Load Balancing across a group of KMS servers for HA
> -------------------------------------------------------------------
>
> Key: HADOOP-11620
> URL: https://issues.apache.org/jira/browse/HADOOP-11620
> Project: Hadoop Common
> Issue Type: Improvement
> Components: kms
> Affects Versions: 2.6.0
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: HADOOP-11620.1.patch, HADOOP-11620.2.patch
>
>
> This patch needs to add support for :
> * specification of multiple hostnames in the kms key provider uri
> * KMS client to load balance requests across the hosts specified in the kms
> keyprovider uri.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)