[
https://issues.apache.org/jira/browse/HIVE-21761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sankar Hariappan resolved HIVE-21761.
-------------------------------------
Resolution: Fixed
All patches committed to master.
> Support table level replication in Hive
> ---------------------------------------
>
> Key: HIVE-21761
> URL: https://issues.apache.org/jira/browse/HIVE-21761
> Project: Hive
> Issue Type: New Feature
> Components: repl
> Reporter: Sankar Hariappan
> Assignee: Sankar Hariappan
> Priority: Major
> Labels: DR, Replication
>
> *Requirements:*
> {code:java}
> - User needs to define replication policy to replicate any specific table.
> This enables user to replicate only the business critical tables instead of
> replicating all tables which may throttle the network bandwidth, storage and
> also slow-down Hive replication.
> - User needs to define replication policy using regular expressions (such as
> db.sales_*) and needs to include additional tables which are non-matching
> given pattern and exclude some tables which are matching given pattern.
> - User needs to dynamically add/remove tables to the list either by manually
> changing the replication policy during run time.
> {code}
> *Design:*
> {code:java}
> 1. Hive continue to support DB level replication policy of format <db_name>
> but logically, we support the policy as <db_name>.'t1|t3| …'.'t*'.
> 2. Regular expression can also be supported as replication policy. For
> example,
> a. <db_name>.'<prefix*>'
> b. <db_name>.'<*suffix>'
> c. <db_name>.'<prefix*suffix>'
> d. <db_name>.'<regex>'
> 3. User can provide include and exclude list to specify the tables to be
> included in the replication policy.
> a. Include list specifies the tables to be included.
> b. Exclude list specifies the tables to be excluded even if it satisfies
> the expression in include list.
> c. So the tables included in the policy is a-b.
> d. For backward compatibility, if no include or exclude list is given, then
> all the tables will be included in
> the policy.
> 4. New format for the Replication policy have 3 parts all separated with Dot
> (.).
> a. First part is DB name.
> b. Second part is included list. Valid java regex within single quote.
> c. Third part is excluded list. Valid java regex within single quote.
> - <db_name> -- Full DB replication which is currently supported
> - <db_name>.'.*?' -- Full DB replication
> - <db_name>.'t1|t3' -- DB replication with static list of tables t1 and
> t3 included.
> - <db_name>.'(t1*)|t2'.'t100' -- DB replication with all tables having
> prefix t1 and also include table t2 which doesn’t have prefix t1 and exclude
> t100 which has the prefix t1.
> 5. If the DB property “repl.source.for” is set, then by default all the
> tables in the DB will be enabled for replication and will continue to archive
> deleted data to CM path.
> 6. REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> a. REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM
> <last_repl_id> WITH <key_values_list>;
> current_repl_policy and previous_repl_policy can be any format mentioned in
> Point-4.
> b. REPLACE clause to be supported to take previous repl policy as input.
> c. Rest of the format remains same.
> 7. Now, REPL DUMP on this DB will replicate the tables based on
> current_repl_policy.
> 8. Single table replication of format <db_name>.t1 is not supported. User can
> provide the same with <db_name>.'t1' format.
> 9. If any table is added dynamically either due to change in regular
> expression or added to include list should be bootstrapped.
> a. Hive will automatically figure out the list of tables newly included in
> the list by comparing the current_repl_policy & previous_repl_policy inputs
> and combine bootstrap dump for added tables as part of incremental dump. As
> we can combine first incremental with bootstrap dump, it removes the current
> limitation of target DB being inconsistent after bootstrap unless we run
> first incremental replication.
> b. If any table is renamed, then it may gets dynamically added/removed for
> replication based on defined replication policy + include/exclude list. So,
> Hive will perform bootstrap for the table which is just included after
> rename.
> c. Also, if renamed table is excluded from replication policy, then need to
> drop the old table at target as well.
> 10. Only the initial bootstrap load expects the target DB to be empty but the
> intermediate bootstrap on tables due to regex or inclusion/exclusion list
> change or renames doesn’t expect the target DB or table to be empty. If any
> table with same name exist during such bootstrap, the table will be
> overwritten including data.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)