mahesh kumar behera created HIVE-21773:
------------------------------------------
Summary: Supporting external table replication with partition
filter.
Key: HIVE-21773
URL: https://issues.apache.org/jira/browse/HIVE-21773
Project: Hive
Issue Type: Sub-task
Components: HiveServer2, repl
Affects Versions: 4.0.0
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera
Fix For: 4.0.0
Hive external table replication is done differently than managed table
replication. In case of external table, list is created for the locations of
the table and partitions to be replicated. If the partition location is within
the table location, then partition location is not added to the list. For
partitions with location outside table, partition location is added to the
list. In case of incremental dump, the data related events are ignored and just
the metadata related events are dumped. The list of location is prepared and
that is used for replication. During load, the events are replayed and then the
distcp tasks are created, one for each location present in the list.
For partition level replication, not all partition will be present in the dump.
So even if the partition locations are within the table location, each
partition location will be added to the list.
* If where condition is present in the REPL DUMP command then add location for
each satisfying partition even though the partition location is within table
location.
* If table is not mentioned in the where clause then follow the older behavior.
* If table is mentioned with a key but the key does not match any of the
partitioned column then fail repl dump.
* If the table is mentioned with the key and even if all the partitions are
satisfying the filter condition, add location for each partition. This is to
avoid copying partitions which are added using alter after the dump.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)