Re: [DISCUSS] Support streaming read Iceberg V2 table

Yufei Gu Wed, 09 Feb 2022 11:29:31 -0800

Hi Reo,

Agree with Walaa, the major concern is that the proposal needs table spec
change and write-time logging.
We try to avoid table spec changes, so that the feature can work on
existing table formats.
1. Users don't have to wait for the new table spec, which may take a while.
2. Users don't have to upgrade their tables, which is usually costy.

We also try to avoid write-time logging. Iceberg table doesn't depend on
any specific engine. The write-time logging means all
clients(Spark/Flink/Trino/Customized Client) have to follow the format of
logging.
1. It's a non-trivial effort to make changes for all of them.
2. The write from a randomized client won't break the CDC records
generation.

In today's community meeting, we discussed the solution we are working on
to get CDC records without table spec change and write-time logging. Will
post the design doc soon.
Here is another issue thread: https://github.com/apache/iceberg/issues/3941

Best,

Yufei

`This is not a contribution`

On Wed, Feb 9, 2022 at 10:43 AM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Hi Reo,
>
> I am not sure if I am reading the proposal correctly or not, but does the
> proposal suggest changing the data file format/schema to support the
> operation type? I think one of the Iceberg principles is not to change the
> data file open formats (Avro, ORC, Parquet, etc) or semantics in an
> Iceberg-specific way.
>
> Also there is a similar discussion here [1], so we may combine the
> discussions in the same thread.
>
> [1] https://lists.apache.org/thread/w3nm6ydc702o1kjr5l3t8d6j01kwjqmz
>
> Thanks,
> Walaa.
>
>
> On Wed, Feb 9, 2022 at 7:05 AM Reo Lei <leinuo...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> As v2 tables become more and more popular, more and more users want to
>> use flink and iceberg to build quasi-real-time data warehouses.
>> But currently iceberg doesn't support incremental reading of v2 tables
>> via flink, so I drafted a design document
>> <https://docs.google.com/document/d/1zEpNYcA5Tf5ysdoj3jO425A1QRI-3OMb_Fy8EG_9DD4/edit?usp=sharing>
>> to support this. The document mainly discusses the type of data stream that
>> needs to be returned for incrementally reading v2 tables and how to save
>> and read the changelog.
>>
>> Please have a look and any feedback would be appreciated!
>>
>> Best Regards,
>> Reo Lei
>>
>

Re: [DISCUSS] Support streaming read Iceberg V2 table

Reply via email to