[
https://issues.apache.org/jira/browse/HIVE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879131#comment-16879131
]
Ashutosh Bapat commented on HIVE-21893:
---------------------------------------
[~sankarh], these two issues can happen even in case of normal bootstrap for a
new policy, not just in case of the one during incremental phase. But anyway
here’s my analysis of problematic cases.
The key point here is following comment in
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask#getValidTxnListForReplDump()
{code:java}
// Key design point for REPL DUMP is to not have any txns older than current
txn in which
// dump runs. This is needed to ensure that Repl dump doesn't copy any data
files written by
// any open txns mainly for streaming ingest case where one delta file shall
have data from
// committed/aborted/open txns. It may also have data inconsistency if the
on-going txns
// doesn't have corresponding open/write events captured which means, catch-up
incremental
// phase won't be able to replicate those txns. So, the logic is to wait for
the given amount
// of time to see if all open txns < current txn is getting aborted/committed.
If not, then
// we forcefully abort those txns just like AcidHouseKeeperService.{code}
Case 1
{quote}If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes
before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14
is done after bootstrap is completed. In this case, bootstrap would replicate
the data/writeId written by Tx2. But, the next incremental cycle would also
replicate the open_txn, allocate_writeid and commit_txn events which would
duplicate the data.
{quote}
If step-11 happens between step-1 and step-2 that itself can cause multiple
problems as the open transaction event is replayed twice (once during bootstrap
and once during next incremental), thus causing writeIds on target going out of
sync with the source. A better solution would be to combine
setLastReplIdForDump() and openTransaction() in Driver.compile() for REPL DUMP
case. We should let openTransaction() return the eventId of the open
transaction event of the REPL DUMP. This eventId would be set as the
lastReplIdForDump(). The next incremental dump will start from the events
following this open transaction event.
With that we will prohibit step 11 from happening between step 1 and step 2. So
step-11 can happen either after step 2 or before 1.
# If it happens after 2, it will not be recorded in the snapshot of DUMP and
thus changes within that transaction will not be replicated during bootstrap.
The next incremental will replicate the events.
# If step-11 happens before step-1 and commits before we start the dump, the
changes by it will be replicated during bootstrap since that transaction will
be considered as visible to the REPL DUMP transaction. If alloc_writeId event
is idempotent for a given transaction on source, once the open transaction
event has been replicated as part of bootstrap, same writeId will be allocated
however times the alloc_writeId event is replicated, thus keeping the writeIds
on source and target in sync. Any files written will be marked with the same
writeId, so copying them multiple times will not duplicate data. So there’s not
correctness issue there in this case either.
case 2
{quote}If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP
thread T1. In this case, table is not bootstrapped but the corresponding
open_txn, allocate_writeid, commit_txn and drop events would be replicated in
next cycle. During next cycle, REPL LOAD would fail on commitTxn event as table
is dropped or event is missing.
{quote}
If step-11 to step 14 happen before step-1, those will be covered by bootstrap
itself and they will not appear in the incremental. I think you wanted to say
that step 14 happens before step 4 thus the table is not bootstrapped, but any
event after open transaction are part of next incremental.
This case is covered by test
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables#testAcidTablesBootstrapWithConcurrentDropTable().
In this case, the ALTER TABLE events created by INSERT operation are converted
to CreateTable on target and thus at the time of commit it sees the table,
which is dropped by subsequent drop event. So, no correctness issue here as
well.
> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -------------------------------------------------------------------------
>
> Key: HIVE-21893
> URL: https://issues.apache.org/jira/browse/HIVE-21893
> Project: Hive
> Issue Type: Bug
> Components: repl
> Affects Versions: 4.0.0
> Reporter: Sankar Hariappan
> Assignee: Ashutosh Bapat
> Priority: Major
> Labels: DR, Replication
>
> ACID tables will be bootstrapped during incremental phase in couple of cases.
> 1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
> 2. If replication policy is changed using REPLACE clause in REPL DUMP where
> the ACID table is matching new policy but not old policy.
> REPL DUMP performed below sequence of operations. Let's say Thread (T1)
> 1. Get Last Repl ID (lastId)
> 2. Open Transaction (Tx1)
> 3. Dump events until lastId.
> 4. Get the list of tables in the given DB.
> 5. If table matches current policy, then bootstrap dump it.
> Let's say, concurrently another thread (let's say T2) is running as follows.
> 11. Open Transaction (Tx2).
> 12. Insert into ACID table Tbl1.
> 13. Commit Transaction (Tx2)
> 14. Drop table (Tbl1) --> Not necessarily same thread, may be from different
> thread as well.
> *Problematic Use-cases:*
> 1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes
> before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14
> is done after bootstrap is completed. In this case, bootstrap would replicate
> the data/writeId written by Tx2. But, the next incremental cycle would also
> replicate the open_txn, allocate_writeid and commit_txn events which would
> duplicate the data.
> 2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP
> thread T1. In this case, table is not bootstrapped but the corresponding
> open_txn, allocate_writeid, commit_txn and drop events would be replicated in
> next cycle. During next cycle, REPL LOAD would fail on commmitTxn event as
> table is dropped or event is missing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)