[ 
https://issues.apache.org/jira/browse/GOBBLIN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Sen updated GOBBLIN-1398:
-----------------------------
    Description: 
GOBBLIN-1308 takes care of ability to connect to remote and secure hive 
metastore, but it still requires the management of hive-site to be provided to 
be container local file. 

When it comes to multiple hive clusters this manual management approach would 
not work, and requires special feature of providing system specific 
hive-site.xml without namespace collision.

 

This ticket aims to do following things

1) define a way to provide remote system configuration ( keep giving flat 
config is more cumbersome )

2) based on system config and feature flag, copy config files to container 
local path automatically.

3) when creating metastoreClient, pick up the right config for the requested 
system ( identified by the metastore URI )

 

 

  was:
Gobblin's hadoop tokens/ key management : 
 Problem: Gobblin only maintains local cluster tokens when key management is 
enabled. and does not have capability to manage tokens for remote hadoop 
cluster. ( based on my conversation with many folks here, the token files can 
be made available externally. but that would require that external system 
running on cron or something )

Solution: add remote cluster token management in Gobblin. where remote clusters 
key can be managed same way it manages the local clusters keys.

 

Config looks like following

( Changes the enable.key.management config to key.management.enabled )

 
{code:java}
gobblin.hadoop.key.management {
 enabled = true
 remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, 
${gobblin_sync_systems.hadoop_cluster2} ]
}

// These Gobblin platform configurations can be moved to database for other 
use-cases, but this layout helps make the platform moduler for each connectors.
gobblin_sync_systems {
 hadoop_cluster1 {
 // if Hadoop config path is specified, the FileSystem will be created based on 
all the xml config provided here, which has all the required info.
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 // If hadoop config path is not specified, you can still specify the speecific 
nodes for the specific type of tokens
 namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", 
"hdfs://nn2.hadoop_cluster1.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", 
"kms2.hadoop_cluster1.example.com:9292" ]
 }
 hadoop_cluster2 {
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", 
"hdfs://nn2.hadoop_cluster2.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", 
"kms2.hadoop_cluster2.example.com:9292" ]
 }
}{code}


> Gobblin kerberos token management for multiple remote hive metastores
> ---------------------------------------------------------------------
>
>                 Key: GOBBLIN-1398
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1398
>             Project: Apache Gobblin
>          Issue Type: Improvement
>    Affects Versions: 0.15.0
>            Reporter: Jay Sen
>            Priority: Major
>             Fix For: 0.16.0
>
>
> GOBBLIN-1308 takes care of ability to connect to remote and secure hive 
> metastore, but it still requires the management of hive-site to be provided 
> to be container local file. 
> When it comes to multiple hive clusters this manual management approach would 
> not work, and requires special feature of providing system specific 
> hive-site.xml without namespace collision.
>  
> This ticket aims to do following things
> 1) define a way to provide remote system configuration ( keep giving flat 
> config is more cumbersome )
> 2) based on system config and feature flag, copy config files to container 
> local path automatically.
> 3) when creating metastoreClient, pick up the right config for the requested 
> system ( identified by the metastore URI )
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to