[ 
https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-18988:
------------------------------------
    Description: 
Bootstrapping of ACID tables, need special handling to replicate a stable state 
of data.
 - If ACID feature enables, then perform bootstrap dump for ACID tables with in 
read txn.
 -> Dump table/partition metadata.
 -> Get the list of valid data files for a table using same logic as read txn 
do.
 -> Dump latest ValidWriteIdList as per current read txn.
 - Set the valid last replication state such that it doesn't miss any open txn 
started after triggering bootstrap dump.
 - If any txns on-going which was opened before triggering bootstrap dump, then 
it is not guaranteed that if open_txn event captured for these txns. Also, if 
these txns are opened for streaming ingest case, then dumped ACID table data 
may include data of open txns which impact snapshot isolation at target. To 
avoid that, bootstrap dump should wait for timeout (new configuration: 
hive.repl.bootstrap.dump.open.txn.timeout). After timeout, just force abort 
those txns and continue.
 - If any txns force aborted belongs to a streaming ingest case, then dumped 
ACID table data may have aborted data too. So, it is necessary to replicate the 
aborted write ids to target to mark those data invalid for any readers.

  was:
Bootstrapping of ACID tables, need special handling to replicate a stable state 
of data.
 - If ACID feature enables, then perform bootstrap dump for ACID tables with in 
read txn.
 -> Dump table/partition metadata.
 -> Get the list of valid data files for a table using same logic as read txn 
do.
 -> Dump latest ValidWriteIdList as per current read txn.
 - Find the valid last replication state such that it points to event ID of 
open_txn event of oldest on-going txn.


> Support bootstrap replication of ACID tables
> --------------------------------------------
>
>                 Key: HIVE-18988
>                 URL: https://issues.apache.org/jira/browse/HIVE-18988
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: ACID, DR, pull-request-available, replication
>             Fix For: 3.1.0
>
>         Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, 
> HIVE-18988.03.patch
>
>
> Bootstrapping of ACID tables, need special handling to replicate a stable 
> state of data.
>  - If ACID feature enables, then perform bootstrap dump for ACID tables with 
> in read txn.
>  -> Dump table/partition metadata.
>  -> Get the list of valid data files for a table using same logic as read txn 
> do.
>  -> Dump latest ValidWriteIdList as per current read txn.
>  - Set the valid last replication state such that it doesn't miss any open 
> txn started after triggering bootstrap dump.
>  - If any txns on-going which was opened before triggering bootstrap dump, 
> then it is not guaranteed that if open_txn event captured for these txns. 
> Also, if these txns are opened for streaming ingest case, then dumped ACID 
> table data may include data of open txns which impact snapshot isolation at 
> target. To avoid that, bootstrap dump should wait for timeout (new 
> configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, 
> just force abort those txns and continue.
>  - If any txns force aborted belongs to a streaming ingest case, then dumped 
> ACID table data may have aborted data too. So, it is necessary to replicate 
> the aborted write ids to target to mark those data invalid for any readers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to