[ 
https://issues.apache.org/jira/browse/HIVE-24187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24187:
--------------------------------
    Summary: Handle _files creation for HA config with same nameservice name on 
source and destination  (was: Handle _files creation for HA config with same 
nameservice on source and destination)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-24187
>                 URL: https://issues.apache.org/jira/browse/HIVE-24187
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pravin Sinha
>            Assignee: Pravin Sinha
>            Priority: Major
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = <boolean>*
> *hive.repl.ha.datapath.replace.remote.nameservice.name = <string>*
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to