[
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sankar Hariappan updated HIVE-21029:
------------------------------------
Component/s: (was: HiveServer2)
> External table replication for existing deployments running incremental
> replication.
> ------------------------------------------------------------------------------------
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
> Issue Type: Bug
> Components: repl
> Affects Versions: 3.0.0, 3.1.0, 3.1.1
> Reporter: anishek
> Assignee: Sankar Hariappan
> Priority: Critical
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch, HIVE-21029.02.patch
>
>
> Existing deployments using hive replication do not get external tables
> replicated. For such deployments to enable external table replication they
> will have to provide a specific switch to first bootstrap external tables as
> part of hive incremental replication, following which the incremental
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex:
> hive.repl.bootstrap.external.tables) and is to be used in
> {code} WITH {code} clause of
> {code} REPL DUMP {code} command.
> Additionally the existing hive config _hive.repl.include.external.tables_
> will always have to be set to "true" in the above clause.
> Proposed usage for enabling external tables replication on existing
> replication policy.
> 1. Consider an ongoing repl policy <db1> in incremental phase.
> Enable hive.repl.include.external.tables=true and
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap”
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of
> external tables as bootstrap wouldn’t clean-up any dropped external tables.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)