[ 
https://issues.apache.org/jira/browse/HADOOP-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331990#comment-14331990
 ] 

Arun Suresh commented on HADOOP-11620:
--------------------------------------

[~lmccay], you make very valid comments. Let me try to address them

bq. .. why is it that we often put the burden of the loadbalancing on the 
clients in Hadoop rather than put the servers behind a loadbalancing process or 
VIP ?
I believe that having the clients manage the load balancing makes the system 
easier to deploy, test and manage since there is one less component (the VIP) 
to configure and handle. Also given that without a KMS, encrypted data cannot 
be accessed, I feel that basic High availability should be part of the core 
library and deployable without the need for external loadbalancers / VIPS. 
Administrators can then decide to use a custom LB or the base client based on 
specific deployment environment considerations.

bq.. .. makes it difficult to provide elastic provisioning of server instances 
since all the clients config would need to be made aware of the changes.
I agree that my current implementation and yes, even the core RM and NN clients 
do suffer from this. But this is something I feel can be fixed, considering the 
fact that the Zookeeper Curator library is being used in hadoop-common, and 
curator does come with a robust library for [Dynamic Service 
Scaling|http://curator.apache.org/curator-x-discovery/] 
I was infact planning on filing a follow-up JIRA for KMS (but yes, looks like 
it might have more general applicability)

The reason I did not want to introduce dynamic scaling in this patch was that I 
was more interested in High Availability, rather than improved read-scalability 
where I need atleast 1 KMS up else encrypted data becomes unreadable which 
implies that for current deployment scenarios, I was not expecting more than 2 
or 3 KMS to participate in the loadbalancing group. Also the simple round-robin 
load balancing would allow the caches in all participating KMSs to be warmed 
over a period of time. Like I mentioned, I do plan to work on the dynamic 
scaling once I hit the read-scalability wall.
  

> Add Support for Load Balancing across a group of KMS servers for HA
> -------------------------------------------------------------------
>
>                 Key: HADOOP-11620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11620
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>    Affects Versions: 2.6.0
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: HADOOP-11620.1.patch, HADOOP-11620.2.patch
>
>
> This patch needs to add support for :
> * specification of multiple hostnames in the kms key provider uri
> * KMS client to load balance requests across the hosts specified in the kms 
> keyprovider uri.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to