[ 
https://issues.apache.org/jira/browse/HIVE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879131#comment-16879131
 ] 

Ashutosh Bapat commented on HIVE-21893:
---------------------------------------

[~sankarh],  these two issues can happen even in case of normal bootstrap for a 
new policy, not just in case of the one during incremental phase. But anyway 
here’s my analysis of problematic cases.

The key point here is following comment in 
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask#getValidTxnListForReplDump()

 
{code:java}
// Key design point for REPL DUMP is to not have any txns older than current 
txn in which
// dump runs. This is needed to ensure that Repl dump doesn't copy any data 
files written by
// any open txns mainly for streaming ingest case where one delta file shall 
have data from
// committed/aborted/open txns. It may also have data inconsistency if the 
on-going txns
// doesn't have corresponding open/write events captured which means, catch-up 
incremental
// phase won't be able to replicate those txns. So, the logic is to wait for 
the given amount
// of time to see if all open txns < current txn is getting aborted/committed. 
If not, then
// we forcefully abort those txns just like AcidHouseKeeperService.{code}
 

 Case 1
{quote}If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes 
before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 
is done after bootstrap is completed. In this case, bootstrap would replicate 
the data/writeId written by Tx2. But, the next incremental cycle would also 
replicate the open_txn, allocate_writeid and commit_txn events which would 
duplicate the data.
{quote}
If step-11 happens between step-1 and step-2 that itself can cause multiple 
problems as the open transaction event is replayed twice (once during bootstrap 
and once during next incremental), thus causing writeIds on target going out of 
sync with the source. A better solution would be to combine 
setLastReplIdForDump() and openTransaction() in Driver.compile() for REPL DUMP 
case. We should let openTransaction() return the eventId of the open 
transaction event of the REPL DUMP. This eventId would be set as the 
lastReplIdForDump(). The next incremental dump will start from the events 
following this open transaction event.

With that we will prohibit step 11 from happening between step 1 and step 2. So 
step-11 can happen either after step 2 or before 1.
 # If it happens after 2, it will not be recorded in the snapshot of DUMP and 
thus changes within that transaction will not be replicated during bootstrap. 
The next incremental will replicate the events.

 # If step-11 happens before step-1 and commits before we start the dump, the 
changes by it will be replicated during bootstrap since that transaction will 
be considered as visible to the REPL DUMP transaction. If alloc_writeId event 
is idempotent for a given transaction on source, once the open transaction 
event has been replicated as part of bootstrap, same writeId will be allocated 
however times the alloc_writeId event is replicated, thus keeping the writeIds 
on source and target in sync. Any files written will be marked with the same 
writeId, so copying them multiple times will not duplicate data. So there’s not 
correctness issue there in this case either.

case 2
{quote}If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP 
thread T1. In this case, table is not bootstrapped but the corresponding 
open_txn, allocate_writeid, commit_txn and drop events would be replicated in 
next cycle. During next cycle, REPL LOAD would fail on commitTxn event as table 
is dropped or event is missing.
{quote}
If step-11 to step 14 happen before step-1, those will be covered by bootstrap 
itself and they will not appear in the incremental. I think you wanted to say 
that step 14 happens before step 4 thus the table is not bootstrapped, but any 
event after open transaction are part of next incremental.

This case is covered by test 
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables#testAcidTablesBootstrapWithConcurrentDropTable().

In this case, the ALTER TABLE events created by INSERT operation are converted 
to CreateTable on target and thus at the time of commit it sees the table, 
which is dropped by subsequent drop event. So, no correctness issue here as 
well.

> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -------------------------------------------------------------------------
>
>                 Key: HIVE-21893
>                 URL: https://issues.apache.org/jira/browse/HIVE-21893
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Ashutosh Bapat
>            Priority: Major
>              Labels: DR, Replication
>
> ACID tables will be bootstrapped during incremental phase in couple of cases. 
> 1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
> 2. If replication policy is changed using REPLACE clause in REPL DUMP where 
> the ACID table is matching new policy but not old policy.
> REPL DUMP performed below sequence of operations. Let's say Thread (T1)
> 1. Get Last Repl ID (lastId)
> 2. Open Transaction (Tx1)
> 3. Dump events until lastId.
> 4. Get the list of tables in the given DB.
> 5. If table matches current policy, then bootstrap dump it.
> Let's say, concurrently another thread  (let's say T2) is running as follows.
> 11. Open Transaction (Tx2).
> 12. Insert into ACID table Tbl1.
> 13. Commit Transaction (Tx2)
> 14. Drop table (Tbl1) --> Not necessarily same thread, may be from different 
> thread as well.
> *Problematic Use-cases:*
> 1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes 
> before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 
> is done after bootstrap is completed. In this case, bootstrap would replicate 
> the data/writeId written by Tx2. But, the next incremental cycle would also 
> replicate the open_txn, allocate_writeid and commit_txn events which would 
> duplicate the data.
> 2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP 
> thread T1. In this case, table is not bootstrapped but the corresponding 
> open_txn, allocate_writeid, commit_txn and drop events would be replicated in 
> next cycle. During next cycle, REPL LOAD would fail on commmitTxn event as 
> table is dropped or event is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to