[
https://issues.apache.org/jira/browse/HIVE-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sankar Hariappan reassigned HIVE-21530:
---------------------------------------
Assignee: Pravin Sinha (was: Sankar Hariappan)
> Replicate Streaming ingestion with transactions batch size greater than 1.
> --------------------------------------------------------------------------
>
> Key: HIVE-21530
> URL: https://issues.apache.org/jira/browse/HIVE-21530
> Project: Hive
> Issue Type: Bug
> Components: repl, Transactions
> Affects Versions: 4.0.0
> Reporter: Sankar Hariappan
> Assignee: Pravin Sinha
> Priority: Major
> Labels: DR, Replication
> Attachments: Hive ACID Replication_ Streaming Ingest Tables.pdf
>
>
> implement replication of hive streaming ingest of tables as per [^Hive ACID
> Replication_ Streaming Ingest Tables.pdf] .
> changes to txn_commit to include information about transaction batch.
> changes to copy task to only copy if there is a difference in file size or
> checksum, seems specific to transaction batch shouldnt be used for normal
> transactions.
> copy the correct sequence of files w.r.t data file + side file.
> remove side files ( which looks like are suffixed as _flush in file names)
> when the batch is committed.
> how do we determine the idempotent nature of the events here, update the
> corresponding table + partition and not copy new version of the file.
> validate if partial copied data files are handled on the target warehouse
> given correct side file. can we leave the side file file forever, in case
> during transaction batch copy after certain transactions are copied over then
> primary warehouse fails. we wont be able to remove _flush file, on failover
> do we have to handle this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)