twalthr commented on code in PR #28123: URL: https://github.com/apache/flink/pull/28123#discussion_r3202097486
########## docs/content/release-notes/flink-2.3.md: ########## @@ -27,19 +27,282 @@ These release notes discuss important aspects, such as configuration, behavior o that changed between Flink 2.2 and Flink 2.3. Please read these notes carefully if you are planning to upgrade your Flink version to 2.3. +### Table SQL / API -### Core +#### FROM_CHANGELOG and TO_CHANGELOG built-in PTFs -#### Set security.ssl.algorithms default value to modern cipher suite +##### [FLINK-39258](https://issues.apache.org/jira/browse/FLINK-39258) (FLIP-564) -### [FLINK-39022](https://issues.apache.org/jira/browse/FLINK-39022) +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: -A JDK update (affecting JDK 11.0.30+, 17.0.18+, 21.0.10+, and 24+) disabled `TLS_RSA_*` cipher suites. -This was done to support forward-secrecy (RFC 9325) and comply with the IETF Draft on *Deprecating Obsolete Key Exchange Methods in TLS*. +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column (and optional + before/after row descriptors) into a dynamic table. A configurable `op_mapping` makes it + straightforward to plug in custom CDC formats (e.g. Debezium-style `c`/`u`/`d` codes), and + `invalid_op_handling` (`FAIL`/`LOG`/`SKIP`) controls how rows with unmapped operation codes + are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert + retract/upsert streams into append form, which is useful for archival, audit and writing to + append-only sinks. `produces_full_deletes` controls whether `-D` records carry the full row. -To support these and future JDK versions, the default value for the Flink configuration option `security.ssl.algorithms` has been changed to a modern, widely available cipher suite: +The two PTFs are designed to be symmetric, so `FROM_CHANGELOG(TO_CHANGELOG(table))` round-trips +correctly. Both support `PARTITION BY` for parallel execution and `uid` for query evolution, and +they expose a `state_ttl` parameter for state retention. Review Comment: The FLIP is not fully implemented yet, only some signatures made it to the release. @gustavodemorais could you write short summary what works in 2.3? ########## docs/content/release-notes/flink-2.3.md: ########## @@ -27,19 +27,282 @@ These release notes discuss important aspects, such as configuration, behavior o that changed between Flink 2.2 and Flink 2.3. Please read these notes carefully if you are planning to upgrade your Flink version to 2.3. +### Table SQL / API -### Core +#### FROM_CHANGELOG and TO_CHANGELOG built-in PTFs -#### Set security.ssl.algorithms default value to modern cipher suite +##### [FLINK-39258](https://issues.apache.org/jira/browse/FLINK-39258) (FLIP-564) -### [FLINK-39022](https://issues.apache.org/jira/browse/FLINK-39022) +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: -A JDK update (affecting JDK 11.0.30+, 17.0.18+, 21.0.10+, and 24+) disabled `TLS_RSA_*` cipher suites. -This was done to support forward-secrecy (RFC 9325) and comply with the IETF Draft on *Deprecating Obsolete Key Exchange Methods in TLS*. +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column (and optional + before/after row descriptors) into a dynamic table. A configurable `op_mapping` makes it + straightforward to plug in custom CDC formats (e.g. Debezium-style `c`/`u`/`d` codes), and + `invalid_op_handling` (`FAIL`/`LOG`/`SKIP`) controls how rows with unmapped operation codes + are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert + retract/upsert streams into append form, which is useful for archival, audit and writing to + append-only sinks. `produces_full_deletes` controls whether `-D` records carry the full row. -To support these and future JDK versions, the default value for the Flink configuration option `security.ssl.algorithms` has been changed to a modern, widely available cipher suite: +The two PTFs are designed to be symmetric, so `FROM_CHANGELOG(TO_CHANGELOG(table))` round-trips +correctly. Both support `PARTITION BY` for parallel execution and `uid` for query evolution, and +they expose a `state_ttl` parameter for state retention. -`TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384` +#### CREATE/ALTER for MATERIALIZED TABLE aligned with TABLE -This default provides strong security and wide compatibility. You can customize the cipher suites using the `security.ssl.algorithms` configuration option if your environment has different requirements. -If these cipher suites are not supported on your setup, you will see that Flink processes will not be able to connect to each other. +##### [FLINK-38673](https://issues.apache.org/jira/browse/FLINK-38673) (FLIP-550) + +The DDL surface of `MATERIALIZED TABLE` is brought to parity with regular tables. `CREATE +MATERIALIZED TABLE` now accepts an explicit column list (including watermarks and primary keys) +in front of the defining `AS` query. `ALTER MATERIALIZED TABLE` gains `ADD`, `MODIFY` and `DROP` +operations on metadata and computed columns, plus `RENAME TO`, allowing materialized tables to +evolve through the same workflow already used for regular Flink tables. + +#### Granular control over data reprocessing during materialized table evolution + +##### [FLINK-39301](https://issues.apache.org/jira/browse/FLINK-39301) (FLIP-557) + +When a materialized table's defining query is changed, Flink would previously always reprocess +historical data from the beginning. Flink 2.3 introduces an optional `START_MODE` clause on +`CREATE [OR ALTER]` and `ALTER MATERIALIZED TABLE`, letting users start the refresh pipeline +`FROM_BEGINNING`, `FROM_NOW[(interval)]`, `FROM_TIMESTAMP(timestamp)`, or resume from previous +offsets when available (`RESUME_OR_FROM_BEGINNING`/`RESUME_OR_FROM_NOW`/`RESUME_OR_FROM_TIMESTAMP`). +The default remains `FROM_BEGINNING` for backward compatibility. + +#### ARTIFACT keyword in CREATE FUNCTION + +##### [FLINK-39081](https://issues.apache.org/jira/browse/FLINK-39081) (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` accepts a new `ARTIFACT` keyword as an alternative to +`JAR`. `ARTIFACT` is intentionally generic so that future ecosystem assets (Python wheels, etc.) +can be referenced through the same syntax. Both keywords are interchangeable and may even be +mixed within a single statement; existing `USING JAR` syntax continues to work unchanged. + +```sql +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; +``` + +#### SinkUpsertMaterializer improvements and changelog disorder handling + +##### [FLINK-38926](https://issues.apache.org/jira/browse/FLINK-38926) (FLIP-558) + +Flink 2.3 reworks how `SinkUpsertMaterializer` handles the case where a query's upsert key +differs from the sink's primary key. Previously this required maintaining the full history of +records and could blow up state. Two changes address this: + +- A new `ON CONFLICT` clause with `DO NOTHING`, `DO ERROR` and `DO DEDUPLICATE` strategies makes + the behavior on key conflict explicit. By default, planning now fails when the upsert and + primary keys differ, requiring the user to choose a conflict strategy. +- Watermark-based record compaction is introduced to fix internal changelog disorder. The + trigger and frequency of compaction are controlled by: + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +#### Process Table Function enhancements + +##### [FLINK-39254](https://issues.apache.org/jira/browse/FLINK-39254) (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain several capabilities aligning them +with the DataStream API: + +- **Late data handling**: late records are no longer silently dropped; PTFs can react to them. +- **`ORDER BY` on table arguments**: `MyPtf(input => TABLE t PARTITION BY k ORDER BY ts)` lets a + PTF receive partitioned rows in deterministic temporal order. +- **`ValueView`**: a new lazy single-value state primitive (`value()`, `update()`, `isEmpty()`, + `clear()`) joins the existing `MapView` and `ListView` for working with single-element state + efficiently. +- **Broadcast state**: PTFs support broadcast state through the new + `@ArgumentHint(ArgumentTrait.BROADCAST_SEMANTIC_TABLE)` and `@StateHint(StateKind.BROADCAST)` + annotations, plus `@ArgumentHint(ArgumentTrait.NOTIFY_STATEFUL_SETS)` for re-evaluating keys + when broadcast state changes. Review Comment: This has not made it to 2.3 yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
