[
https://issues.apache.org/jira/browse/GOBBLIN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jay Sen updated GOBBLIN-1398:
-----------------------------
Description:
GOBBLIN-1308 takes care of ability to connect to remote and secure hive
metastore, but it still requires the management of hive-site to be provided to
be container local file.
When it comes to multiple hive clusters this manual management approach would
not work, and requires special feature of providing system specific
hive-site.xml without namespace collision.
This ticket aims to do following things
1) define a way to provide remote system configuration ( keep giving flat
config is more cumbersome )
2) based on system config and feature flag, copy config files to container
local path automatically.
3) when creating metastoreClient, pick up the right config for the requested
system ( identified by the metastore URI )
was:
Gobblin's hadoop tokens/ key management :
Problem: Gobblin only maintains local cluster tokens when key management is
enabled. and does not have capability to manage tokens for remote hadoop
cluster. ( based on my conversation with many folks here, the token files can
be made available externally. but that would require that external system
running on cron or something )
Solution: add remote cluster token management in Gobblin. where remote clusters
key can be managed same way it manages the local clusters keys.
Config looks like following
( Changes the enable.key.management config to key.management.enabled )
{code:java}
gobblin.hadoop.key.management {
enabled = true
remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1},
${gobblin_sync_systems.hadoop_cluster2} ]
}
// These Gobblin platform configurations can be moved to database for other
use-cases, but this layout helps make the platform moduler for each connectors.
gobblin_sync_systems {
hadoop_cluster1 {
// if Hadoop config path is specified, the FileSystem will be created based on
all the xml config provided here, which has all the required info.
hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
// If hadoop config path is not specified, you can still specify the speecific
nodes for the specific type of tokens
namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020",
"hdfs://nn2.hadoop_cluster1.example.com:8020"]
kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292",
"kms2.hadoop_cluster1.example.com:9292" ]
}
hadoop_cluster2 {
hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020",
"hdfs://nn2.hadoop_cluster2.example.com:8020"]
kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292",
"kms2.hadoop_cluster2.example.com:9292" ]
}
}{code}
> Gobblin kerberos token management for multiple remote hive metastores
> ---------------------------------------------------------------------
>
> Key: GOBBLIN-1398
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1398
> Project: Apache Gobblin
> Issue Type: Improvement
> Affects Versions: 0.15.0
> Reporter: Jay Sen
> Priority: Major
> Fix For: 0.16.0
>
>
> GOBBLIN-1308 takes care of ability to connect to remote and secure hive
> metastore, but it still requires the management of hive-site to be provided
> to be container local file.
> When it comes to multiple hive clusters this manual management approach would
> not work, and requires special feature of providing system specific
> hive-site.xml without namespace collision.
>
> This ticket aims to do following things
> 1) define a way to provide remote system configuration ( keep giving flat
> config is more cumbersome )
> 2) based on system config and feature flag, copy config files to container
> local path automatically.
> 3) when creating metastoreClient, pick up the right config for the requested
> system ( identified by the metastore URI )
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)