Hey Steven,

That sounds very exciting! I'm not a heavy Flink user, but I don't see any
issues enabling it on Flink 1.20. We should make it explicit in the
changelog, and if possible give some hints on how to drain the Flink jobs.

Kind regards,
Fokko

Op ma 12 aug 2024 om 04:57 schreef Steven Wu <stevenz...@gmail.com>:

>
> *What*
>
> In the next Iceberg 1.7 release with Flink 1.20 support [1], I
> am proposing to make the following changes for *Flink* *1.20 only* .
>
> 1. Mark the old `FlinkSource` as deprecated and redirect users to the
> FLIP-27 `IcebergSource` in the Javadoc.
>
> 2. Make the FLIP-27 source the default for Flink SQL. Users can still opt
> back to the old source via config if needed. Due to the change of source
> implementation and checkpoint state, users won't be able to restore from
> checkpoint/savepoint for the upgrade to Flink 1.20 and Iceberg 1.7. As
> Flink doesn't guarantee state compatibility for new major-minor Flink
> version upgrades e.g. from 1.19 to 1.20 [12], this should be acceptable
> to Flink SQL users. We should clearly call out the change and state
> incompatibility in the release notes.
>
> *Why*
>
> FLIP-27 is the new source interface introduced by Flink in early 2021. The
> new FLIP-27 `IcebergSource` implementation [2] was added into Iceberg
> around mid of 2022. It was initially added as @Experimental and requires
> code change to switch to the new API. For Flink SQL jobs, default is still
> the old `FlinkSource` implementation and requires config change to opt in
> to the FLIP-27 `IcebergSource`.
>
> It has been two years since the initial introduction of FLIP-27 source
> implementation in Iceberg. Now is probably a good time to switch the
> default to FLIP-27 source.
>
> 1. The community has continue to improve the FLIP-27 sources, like JSON
> serializer for FileScanTask [3], split discovery throttling [4], watermark
> alignment [5], split enumerator monitoring metrics [6], metadata table
> reading [8], speculative execution [9]. Those improvements are not
> available in the old source implementation.
> 2. We have recently closed the remaining gaps like limit pushdown [10],
> inferring source parallelism [11] for batch execution to achieve feature
> parity between the old and new FLIP-27 source.
> 3.FLIP-27 source has been used by many users in the production environment
> for almost two years now. It has been battle tested.
> 4. The old SourceFunction interface has been marked as deprecated since
> Flink 1.18 on Aug 2023 [7].
>
>
> *References*
> [1] https://github.com/apache/iceberg/pull/10881
> [2] https://github.com/apache/iceberg/projects/23
> [3] https://github.com/apache/iceberg/issues/1698
> [4] https://github.com/apache/iceberg/pull/6299
> [5] https://github.com/apache/iceberg/pull/8553
> [6] https://github.com/apache/iceberg/pull/9524
> [7] https://issues.apache.org/jira/browse/FLINK-28046
> [8] https://github.com/apache/iceberg/pull/6222
> [9] https://github.com/apache/iceberg/pull/10548
> [10] https://github.com/apache/iceberg/pull/10748
> [11] https://github.com/apache/iceberg/pull/10832
> [12]
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/upgrading/#table-api--sql
>
>

Reply via email to