[ 
https://issues.apache.org/jira/browse/HIVE-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-21530:
---------------------------------------

    Assignee: Pravin Sinha  (was: Sankar Hariappan)

> Replicate Streaming ingestion with transactions batch size greater than 1.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-21530
>                 URL: https://issues.apache.org/jira/browse/HIVE-21530
>             Project: Hive
>          Issue Type: Bug
>          Components: repl, Transactions
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Pravin Sinha
>            Priority: Major
>              Labels: DR, Replication
>         Attachments: Hive ACID Replication_ Streaming Ingest Tables.pdf
>
>
> implement replication of hive streaming ingest of tables as per  [^Hive ACID 
> Replication_ Streaming Ingest Tables.pdf] .
> changes to txn_commit to include information about transaction batch.
> changes to copy task to only copy if there is a difference in file size or 
> checksum, seems specific to transaction batch shouldnt be used for normal 
> transactions.
> copy the correct sequence of files w.r.t data file + side file.
> remove side files ( which looks like are suffixed as _flush in file names) 
> when the batch is committed.
> how do we determine the idempotent nature of the events here, update the 
> corresponding table + partition and not copy new version of the file.
> validate if partial copied data files are handled on the target warehouse 
> given correct side file. can we leave the side file file forever, in case 
> during transaction batch copy after certain transactions are copied over then 
> primary warehouse fails. we wont be able to remove _flush file, on failover 
> do we have to handle this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to