Thanks everyone for reviewed and replied here! Will reply on the Google
doc. And also reply the emails here one by one.

Hi Manu,

Thanks for your questions! The answers are as below.
1.  In many cases, people are still replicating across data centers or even
cloud providers even the cloud storage has high durability. On the hiccup
of the replication, we will not fail the primary table's transaction.
2.  In rewrite_data_files, we still consider it as transaction and
replicate the new data files. We might want to be consistent across primary
and secondary including the data content and file layout.
3. Yeah, creating new data files is the current solution. Yufei also
mentioned that in the comments.

Xinli

On Wed, Oct 1, 2025 at 9:26 AM Manu Zhang <[email protected]> wrote:

> Thanks for the proposal Xinli. I have also thought through Iceberg table
> replication before and have some doubts over this approach.
>
> 1. Will synchronous replication be useful since underlying
> distributed file systems like S3 already provide high durability? On the
> other hand, a cross-datacenter network hiccup would fail the commit. It
> might involve an oncall to disable the option for a commit to succeed if
> the network issue lasts for a while. IMO, replication for disaster recovery
> should be transparent and have no impact on users' applications.
>
> 2. How about commits from rewrite_data_files? Will it replicate the entire
> table if all files of a table have been rewritten? In this case, there's
> actually no "changes" to the table and I think only "changes" are needed to
> replicate.
>
> 3. Metadata replication is not easy to get right. We've seen such issues
> [1] with rewrite_table_path that not updating sizes in manifest lists could
> lead to correctness problems. How about creating new metadata files for
> replicated data files?
>
> [1] https://github.com/apache/iceberg/issues/13719
>
> Best,
> Manu
>
> On Wed, Oct 1, 2025 at 6:43 AM Chao Sun <[email protected]> wrote:
>
>> Thanks for the proposal Xinli! It sounds very useful and I also just left
>> some comments.
>>
>> On Mon, Sep 29, 2025 at 8:42 PM Gang Wu <[email protected]> wrote:
>>
>>> This thread was accidentally in my spam folder.
>>>
>>> I have left some comments with regard to the implication on the Iceberg
>>> rest catalog side.
>>>
>>> Best,
>>> Gang
>>>
>>> On Tue, Sep 30, 2025 at 5:44 AM Huaxin Gao <[email protected]> wrote:
>>>
>>>> Thanks for the proposal. I think it's in the right direction. I left
>>>> some comments and will take another look when time allows.
>>>>
>>>> Huaxin
>>>>
>>>> On 2025/09/27 17:27:29 Xinli shang wrote:
>>>> > Hi all,
>>>> >
>>>> > I’d like to propose adding *native incremental replication* to Iceberg
>>>> > tables.
>>>> >
>>>> > *Motivation:* Many production deployments require cross–data center
>>>> backup
>>>> > and data locality. Today this is usually handled by external services,
>>>> > which adds operational overhead and introduces failure modes outside
>>>> > Iceberg’s transactional boundary. Integrating replication into the
>>>> commit
>>>> > workflow would simplify operations and improve consistency.
>>>> >
>>>> > *Proposal:* An optional replication phase in the commit process would
>>>> > automatically copy data files and metadata to one or more targets
>>>> (e.g.,
>>>> > S3, HDFS, GCS, Azure). Replication is configured via table properties
>>>> and
>>>> > supports both synchronous (immediate consistency, higher latency) and
>>>> > asynchronous (background retries, eventual consistency) modes. This
>>>> > provides built-in disaster recovery, data locality optimization, and
>>>> > cross-region analytics without external tool
>>>> >
>>>> > Full draft proposal with design details is here:
>>>> > 👉 Incremental Iceberg Replication Proposal
>>>> > <
>>>> https://docs.google.com/document/d/1yrVLs0CQyIHs9WbBVx_EK6ad419Adsl9xHozpmQEMrs/edit?tab=t.0#heading=h.aa5ph23raz9l
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Xinli
>>>> >
>>>>
>>>

-- 
Xinli Shang

Reply via email to