[ 
https://issues.apache.org/jira/browse/HIVE-21764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21764:
---------------------------------------
    Description: 
* REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
{code:java}
- REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM 
<last_repl_id> WITH <key_values_list>;
- current_repl_policy and previous_repl_policy can be any format mentioned in 
Point-4.
- REPLACE clause to be supported to take previous repl policy as input. If 
REPLACE clause is not there, then the policy remains unchanged.
- Rest of the format remains same.{code}

 * Now, REPL DUMP on this DB will replicate the tables based on 
current_repl_policy.

 * Currently single table replication of format <db_name>.t1 is not supported 
for table level replication. So it will be not be supported in replace clause 
also.

 * If any table is added dynamically either due to change in regular expression 
or added to include list should be bootstrapped using independent table level 
replication policy.
{code:java}
- Hive will automatically figure out the list of tables newly included in the 
list by comparing the current_repl_policy & previous_repl_policy inputs and 
combine bootstrap dump for added tables as part of incremental dump. 
"_bootstrap" directory can be created in dump dir to accommodate all tables to 
be bootstrapped.
- If any table is renamed, then it may gets dynamically added/removed for 
replication based on defined replication policy + include/exclude list. So, 
Hive will perform bootstrap for the table which is just included after rename.
- Tables added after the previous policy run and before replace policy, will be 
replicated using bootstrap if the table name satisfies inclusion in both the 
policy. The events generated for those tables will be ignored while dumping the 
events.{code}

 * REPL LOAD should check for changes in REPL policy and drop the tables/views 
excluded in the new policy compared to previous policy. It should be done 
before performing incremental and bootstrap load from the current dump. Both 
the policy will be stored in _bootstrap directory and will be used during REPL 
load to drop the redundant tables.

 * REPL LOAD on incremental dump should load events directories first and then 
check for "_bootstrap" directory and perform bootstrap load on them.

Rename table is not in scope of this Jira.

  was:
REPL DUMP fetches the events from NOTIFICATION_LOG table based on regular 
expression + inclusion/exclusion list. So, in case of rename table event, the 
event will be ignored if old table doesn't match the pattern but the new table 
should be bootstrapped. REPL DUMP should have a mechanism to detect such tables 
and automatically bootstrap with incremental replication.
Also, if renamed table is excluded from replication policy, then need to drop 
the old table at target as well.


> REPL DUMP should detect and bootstrap any rename table events where old table 
> was excluded but renamed table is included.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21764
>                 URL: https://issues.apache.org/jira/browse/HIVE-21764
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>            Reporter: Sankar Hariappan
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: DR, Replication
>
> * REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code:java}
> - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM 
> <last_repl_id> WITH <key_values_list>;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.{code}
>  * Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
>  * Currently single table replication of format <db_name>.t1 is not supported 
> for table level replication. So it will be not be supported in replace clause 
> also.
>  * If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independent 
> table level replication policy.
> {code:java}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> - Tables added after the previous policy run and before replace policy, will 
> be replicated using bootstrap if the table name satisfies inclusion in both 
> the policy. The events generated for those tables will be ignored while 
> dumping the events.{code}
>  * REPL LOAD should check for changes in REPL policy and drop the 
> tables/views excluded in the new policy compared to previous policy. It 
> should be done before performing incremental and bootstrap load from the 
> current dump. Both the policy will be stored in _bootstrap directory and will 
> be used during REPL load to drop the redundant tables.
>  * REPL LOAD on incremental dump should load events directories first and 
> then check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to