anishek created HIVE-20911:
------------------------------
Summary: External Table Replication for Hive
Key: HIVE-20911
URL: https://issues.apache.org/jira/browse/HIVE-20911
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 4.0.0
Reporter: anishek
Fix For: 4.0.0
External tables are not replicated currently as part of hive replication. As
part of this jira we want to enable that.
Approach:
* Target cluster will have a top level base directory config that will be used
to copy all data relevant to external tables. This will be provided via the
*with* clause in the *repl load* command. This base path will be prefixed to
the path of the same external table on source cluster.
* Since changes to directories on the external table can happen without hive
knowing it, hence we cant capture the relevant events when ever new data is
added or removed, we will have to copy the data from the source path to target
path for external tables every time we run incremental replication.
** this will require incremental *repl dump* to now create an additional file
*\_external_\tables\_info* with data in the following form
{code}
OpearationType,tableName,base64Encoded(tableDataLocation)
{code}
where OpeartionType can be one in (ADD, REMOVE)
** *repl load* will look up all the external tables on target and remove tables
listed with REMOVE type in the above file.
** For the remaining tables it will create tasks for the corresponding paths
from source to target along with the existing tasks for incremental load.
* New External tables will be created with data copied as part of regular tasks
wile incremental load, applying the base directory prefix
* Bootstrap will also create / copy these external tables as part of their
regular workflow, applying the base directory prefix
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)