[
https://issues.apache.org/jira/browse/HIVE-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sankar Hariappan updated HIVE-16676:
------------------------------------
Summary: Bootstrap REPL DUMP should ensure no data loss due to concurrent
RENAME operations. (was: Bootstrap REPL DUMP should ensure no data loss due to
concurrent operations.)
> Bootstrap REPL DUMP should ensure no data loss due to concurrent RENAME
> operations.
> -----------------------------------------------------------------------------------
>
> Key: HIVE-16676
> URL: https://issues.apache.org/jira/browse/HIVE-16676
> Project: Hive
> Issue Type: Sub-task
> Components: repl
> Affects Versions: 2.1.0
> Reporter: Sankar Hariappan
> Assignee: Sankar Hariappan
>
> For bootstrap dump, if the table is renamed after fetching the table names,
> then new table will be missing in the dump and so the target database doesn't
> have both old and new table. During incremental replication, later RENAME
> events will be noop as the old table doesn't exist in target.
> To generalise the solution for this issue, the following logic is proposed.
> 1. Each table should store the CREATE event ID into the table parameters. If
> a table follows Create -> Drop -> Create sequence, then it is easy to
> differentiate if the table is old or new one.
> 2. Bootstrap should combine the delta changes as Incremental Dump into the
> dumpDir.
> 3. After bootstrap dump completes, then traverse the events from
> bootDumpBeginReplId.
> - If a RENAME event is found, then check,
> - If the source table is dumped and create event ID matches, then just
> dump the RENAME event as such.
> - If the source table is dumped but the create event ID is later than the
> event, then skip the event.
> - If the source table doesn’t exist, but the target table exists, then
> skip the event.
> - If both source and target tables are missing, then dump the target
> table to the bootstrap dumpDir.
> 4. For other events, just dump the event with following logic.
> - CREATE: If object exists, then skip else dump it.
> - DROP: If object doesn’t exist, then skip else dump it.
> - ALTER: If the object exist and the create event ID matches, then dump
> else skip it.
> 5. Rename event load should check,
> - If source table exists and if create event ID is same, then apply the
> event else skip it.
> - If source table doesn’t exist, then check if the target table exists,
> if yes, then skip the event.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)