[GitHub] [flink-web] rmetzger commented on a change in pull request #352: Add 1.11 Release announcement.

GitBox Thu, 02 Jul 2020 01:47:09 -0700


rmetzger commented on a change in pull request #352:
URL: https://github.com/apache/flink-web/pull/352#discussion_r448846715




##########
File path: _posts/2020-06-29-release-1.11.0.md
##########
@@ -0,0 +1,299 @@
+---
+layout: post 
+title:  "Apache Flink 1.11.0 Release Announcement" 
+date: 2020-06-29T08:00:00.000Z
+categories: news
+authors:
+- morsapaes:
+  name: "Marta Paes"
+  twitter: "morsapaes"
+
+excerpt: The Apache Flink community is proud to announce the release of Flink 
1.11.0! This release marks the first milestone in realizing a new vision for 
fault tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+---
+
+The Apache Flink community is proud to announce the release of Flink 1.11.0! 
This release marks the first milestone in realizing a new vision for fault 
tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+
+This blog post describes all major new features and improvements, important 
changes to be aware of and what to expect moving forward.
+
+{% toc %}
+
+The binary distribution and source artifacts are now available on the updated 
[Downloads page]({{ site.baseurl }}/downloads.html) of the Flink website, and 
the most recent distribution of PyFlink is available on 
[PyPI](https://pypi.org/project/apache-flink/). For more details, check the 
complete [release 
changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364&styleName=Html&projectId=12315522)
 and the [updated documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/flink-docs-release-1.11/). 
+
+We encourage you to download the release and share your feedback with the 
community through the [Flink mailing 
lists](https://flink.apache.org/community.html#mailing-lists) or 
[JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## New Features and Improvements
+
+### Unaligned Checkpoints (Beta)
+
+Triggering a checkpoint in Flink will cause a [checkpoint barrier]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/internals/stream_checkpointing.html#barriers) to flow 
from the sources of your topology all the way towards the sinks. For operators 
that receive more than one input stream, the barriers flowing through each 
channel need to be aligned before the operator can snapshot its state and 
forward the checkpoint barrier — typically, this alignment will take just a few 
milliseconds to complete, but it can become a bottleneck in backpressured 
pipelines as:
+
+ * Checkpoint barriers will flow much slower through backpressured channels, 
effectively blocking the remaining channels and their upstream operators during 
checkpointing;
+
+ * Slow checkpoint barrier propagation leads to longer checkpointing times and 
can, worst case, result in little to no progress in the application.
+
+To improve the performance of checkpointing under backpressure scenarios, the 
community is rolling out the first iteration of unaligned checkpoints 
([FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints))
 with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), 
this approach doesn’t wait for barrier alignment across input channels, instead 
allowing barriers to overtake in-flight records and forwarding them downstream 
before the synchronous part of the checkpoint takes place (Fig. 2).
+
+<div style="line-height:60%;">
+    <br>
+</div>
+
+<div class="row">
+  <div class="col-lg-6">
+    <div class="text-center">
+      <figure>
+               <img src="{{ site.baseurl 
}}/img/blog/2020-06-29-release-1.11.0/image1.gif" width="600px" alt="Aligned 
Checkpoints"/>
+               <br/><br/>
+               <figcaption><i><b>Fig.1:</b> Aligned 
Checkpoints</i></figcaption>
+         </figure>
+    </div>
+  </div>
+  <div class="col-lg-6">
+    <div class="text-center">
+      <figure>
+               <img src="{{ site.baseurl 
}}/img/blog/2020-06-29-release-1.11.0/image2.png" width="600px" alt="Unaligned 
Checkpoints"/>
+               <br/><br/>
+               <figcaption><i><b>Fig.2:</b> Unaligned 
Checkpoints</i></figcaption>
+         </figure>
+    </div>
+  </div>
+</div>
+
+<div style="line-height:150%;">
+    <br>
+</div>
+
+Because in-flight records have to be persisted as part of the snapshot, 
unaligned checkpoints will lead to increased checkpoints sizes. On the upside, 
**checkpointing times are heavily reduced**, so users will see more progress 
(even in unstable environments) as more up-to-date checkpoints will lighten the 
recovery process. You can learn more about the current limitations of unaligned 
checkpoints in the [documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/concepts/stateful-stream-processing.html#unaligned-checkpointing),
 and track the improvement work planned for this feature in 
[FLINK-14551](https://issues.apache.org/jira/browse/FLINK-14551). 
+
+As with any beta feature, we appreciate early feedback that you might want to 
share with the community after giving unaligned checkpoints a try!
+
+<span class="label label-info">Info</span> To enable this feature, you need to 
configure the [``enableUnalignedCheckpoints``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html)
 option in your [checkpoint config]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing).
 Please note that unaligned checkpoints can only be enabled if 
``checkpointingMode`` is set to ``CheckpointingMode.EXACTLY_ONCE``.
+
+### Unified Watermark Generators
+
+So far, watermark generation (prev. also called _assignment_) relied on two 
different interfaces: [``AssignerWithPunctuatedWatermarks``]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPunctuatedWatermarks.html)
 and [``AssignerWithPeriodicWatermarks``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPeriodicWatermarks.html);
 that were closely intertwined with timestamp extraction. This made it 
difficult to implement long-requested features like support for idleness 
detection, besides leading to code duplication and maintenance burden. With 
[FLIP-126](https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners),
 the legacy watermark assigners are unified into a single interface: the 
[``WatermarkGenerator``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkGenerator.html);
 and detached from the [``TimestampAssigner``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/TimestampAssigner.html).
 
+
+This gives users more control over watermark emission and simplifies the 
implementation of new connectors that need to support watermark assignment and 
timestamp extraction at the source (see _[New Data Source 
API](#new-data-source-api-beta)_). Multiple [strategies for watermarking]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11//dev/event_timestamps_watermarks.html#introduction-to-watermark-strategies)
 are available out-of-the-box as convenience methods in Flink 1.11 (e.g. 
[``forBoundedOutOfOrderness``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forBoundedOutOfOrderness-java.time.Duration-),
 [``forMonotonousTimestamps``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forMonotonousTimestamps--)),
 though you can also choose to customize your own.
+
+**Support for Watermark Idleness Detection**
+
+The [``WatermarkStrategy.withIdleness()``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#withIdleness-java.time.Duration-)
 method allows you to mark a stream as idle if no events arrive within a 
configured time (i.e. a timeout duration), which in turn allows handling event 
time skew properly and preventing idle partitions from holding back the event 
time progress of the entire application. Users can already benefit from 
**per-partition idleness detection** in the Kafka connector, which has been 
adapted to use the new interfaces 
([FLINK-17669](https://issues.apache.org/jira/browse/FLINK-17669)).
+
+<span class="label label-info">Note</span> This FLIP introduces no breaking 
changes. We recommend that users give preference to the new 
``WatermarkGenerator`` interface moving forward, in preparation for the 
deprecation of the legacy watermark assigners in future releases.
+
+### New Data Source API (Beta)
+
+Up to this point, writing a production-grade source connector for Flink was a 
non-trivial task that required users to be somewhat familiar with Flink 
internals and account for implementation details like event time assignment, 
watermark generation or idleness detection in their code. Flink 1.11 introduces 
a new Data Source API 
([FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface))
 to overcome these limitations, as well as the need to rewrite separate code 
for batch and streaming execution.
+
+<center>
+       <figure>
+       <img src="{{ site.baseurl 
}}/img/blog/2020-06-29-release-1.11.0/image3.png" width="600px" alt="Data 
Source API"/>
+       </figure>
+</center>
+
+<div style="line-height:150%;">
+    <br>
+</div>
+
+Separating the work of split discovery and the actual reading of the consumed 
data (i.e. the [_splits_]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/stream/sources.html#data-source-concepts)) in 
different components — resp. the ``SplitEnumerator`` and ``SourceReader`` — 
allows mixing and matching different enumeration strategies and split readers. 
+
+As an example, the existing Kafka connector has multiple strategies for 
partition discovery that are intermingled with the rest of the code. With the 
new interfaces in place, it would only need a single reader implementation and 
there could be several split enumerators for the different partition discovery 
strategies.
+
+**Batch and Streaming Unification**
+
+Source connectors implemented using the Data Source API will be able to work 
both as a bounded (_batch_) and unbounded (_streaming_) source. The difference 
between both cases is minimal: for bounded input, the ``SplitEnumerator`` will 
generate a fixed set of splits and each split is finite; for unbounded input, 
either the splits are not finite or the ``SplitEnumerator`` keeps generating 
new splits.
+
+**Implicit Watermark and Event Time Handling**
+
+The ``TimestampAssigner`` and ``WatermarkGenerator`` run transparently as part 
of the ``SourceReader`` component, so users also don’t have to implement any 
timestamp extraction or watermark generation code.
+
+<span class="label label-info">Note</span> The existing source connectors have 
not yet been reimplemented using the Data Source API — this is planned for 
upcoming releases. If you’re looking to implement a new source, please refer to 
the [Data Source documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/stream/sources.html#data-sources) and [the tips 
on source development]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/stream/sources.html#the-data-source-api).
+
+### Application Mode Deployments
+
+Prior to Flink 1.11, jobs in a Flink application could either be submitted to 
a long-running [Flink Session Cluster]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/concepts/flink-architecture.html#flink-session-cluster)
 (_session mode_) or a dedicated [Flink Job Cluster]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/concepts/flink-architecture.html#flink-job-cluster) 
(_job mode_). For both these modes, the ``main()`` method of user programs runs 
on the _client_ side. This presents some challenges: on one hand, if the client 
is part of a large installation, it can easily become a bottleneck for 
``JobGraph`` generation; and on the other, it’s not a good fit for 
containerized environments like Docker or Kubernetes.
+
+From this release on, Flink gets an additional deployment mode: [Application 
Mode]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/ops/deployment/#application-mode) 
([FLIP-85](https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode));
 where the ``main()`` method runs on the cluster, rather than the client. The 
job submission becomes a one-step process: you package your application logic 
and dependencies into an executable job JAR and the cluster entrypoint 
([``ApplicationClusterEntryPoint``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/client/deployment/application/ApplicationClusterEntryPoint.html))
 is responsible for calling the ``main()`` method to extract the ``JobGraph``. 
+
+In Flink 1.11, the community worked to already support _application mode_ in 
Kubernetes ([FLINK-10934](https://issues.apache.org/jira/browse/FLINK-10934)).
+
+### Other Improvements
+
+**Unified Memory Configuration for JobManagers 
([FLIP-116](https://jira.apache.org/jira/browse/FLINK-16614))**
+
+Following the work started in Flink 1.10 to improve memory management and 
configuration, this release introduces a new memory model that aligns the 
[JobManagers’ configuration options]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/ops/memory/mem_setup_master.html) and terminology 
with that introduced in 
[FLIP-49](https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors)
 for TaskManagers. This affects all deployment types: standalone, YARN, Mesos 
and the new active Kubernetes integration. 
+
+<span class="label label-danger">Attention</span> Reusing a previous Flink 
configuration without any adjustments can result in differently computed memory 
parameters for the JVM and, as a result, performance changes or even failures. 
Make sure to check the [migration 
guide](https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration)
 if you’re planning to update to the latest version.
+
+**Improvements to the Flink WebUI 
([FLIP-75](https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal))**
+
+In Flink 1.11, the community kicked off a series of improvements to the Flink 
WebUI. The first to roll out are better TaskManager and JobManager log display 
([FLIP-103](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427143)),
 as well as a new thread dump utility 
([FLINK-14816](https://issues.apache.org/jira/browse/FLINK-14816)). Some 
additional work planned for upcoming releases includes better backpressure 
detection, more flexible and configurable exception display and support for 
displaying the history of subtask failure attempts.
+
+**Docker Image Unification 
([FLIP-111](https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification))**
+
+With this release, all Docker-related resources have been consolidated into 
[apache/flink-docker](https://github.com/apache/flink-docker) and the entry 
point script has been extended to allow users to run the default Docker image 
in different modes without the need to create a custom image. The [updated 
documentation](https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image)
 describes in detail how to use and customize the official Flink Docker image 
for different environments and deployment modes.
+
+<hr>
+
+### Table API/SQL: Support for Change Data Capture (CDC)
+
+Change Data Capture (CDC) has become a popular pattern to capture committed 
changes from a database and propagate those changes to downstream consumers, 
for example to keep multiple datastores in sync and avoid common pitfalls such 
as [dual writes](https://thorben-janssen.com/dual-writes/). Being able to 
easily ingest and interpret these changelogs into the Table API/SQL has been a 
highly demanded feature in the Flink community — and it’s now possible with 
Flink 1.11.
+
+To extend the scope of the Table API/SQL to use cases like CDC, Flink 1.11 
introduces new table source and sink interfaces with **changelog mode** (see 
_[New TableSource and TableSink 
Interfaces](#other-improvements-to-the-table-apisql)_) and support for the 
[Debezium](https://debezium.io/) and [Canal](https://github.com/alibaba/canal) 
formats 
([FLIP-105](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289)).
 This means that dynamic tables sources are no longer limited to append-only 
operations and can ingest these external changelogs (``INSERT`` events), 
interpret them into change operations (``INSERT``, ``UPDATE``, ``DELETE`` 
events) and emit them downstream with the change type.
+
+<center>
+       <figure>
+       <img src="{{ site.baseurl 
}}/img/blog/2020-06-29-release-1.11.0/image4.png" width="500px" alt="CDC"/>
+       </figure>
+</center>
+
+<div style="line-height:150%;">
+    <br>
+</div>
+
+Users have to specify either ``“format=debezium-json”`` or 
``“format=canal-json”`` in their ``CREATE TABLE`` statement to consume 
changelogs using SQL DDL. 
+
+```sql
+CREATE TABLE my_table (
+  ...
+) WITH (
+  'connector'='...', -- e.g. 'kafka'
+  'format'='debezium-json',
+  'debezium-json.schema-include'='true' -- default: false (Debezium can be 
configured to include or exclude the message schema)
+  'debezium-json.ignore-parse-errors'='true' -- default: false
+);
+```
+
+Flink 1.11 only supports Kafka as a changelog source out-of-the-box and 
JSON-encoded changelogs, with Avro (Debezium) and Protobuf (Canal) planned for 
future releases. There are also plans to support MySQL binlogs and Kafka 
compacted topics as sources, as well as to extend changelog support to batch 
execution.
+
+### Table API/SQL: JDBC Catalog Interface and Postgres Catalog
+
+Flink 1.11 introduces a generic JDBC catalog interface 
([FLIP-93](https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog))
 that enables users of the Table API/SQL to **derive table schemas 
automatically** from connections to relational databases over [JDBC]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/connect.html#jdbc-connector). This 
eliminates the previous need for manual schema definition and type conversion, 
and also allows to check for schema errors at compile time instead of runtime. 
+
+The first implementation, rolling out with the new release, is the [Postgres 
catalog]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/catalogs.html#postgrescatalog).
+
+### Table API/SQL: FileSystem Connector with Support for Avro, ORC and Parquet
+
+To improve the user experience for end-to-end streaming ETL use cases, the 
Flink community worked on a new FileSystem Connector for the Table API/SQL 
([FLIP-115](https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table)).
 The implementation is based on Flink’s [FileSystem abstraction]({{ 
site.DOCS_BASE_URL }}flink-docs-release-1.11/ops/filesystems/index.html) and 
reuses [StreamingFileSink]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/connectors/streamfile_sink.html) to ensure the 
same set of capabilities and consistent behaviour with the DataStream API. 
+
+StreamingFileSink has also been extended to support a wider range of 
[bulk-encoded formats]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/connectors/streamfile_sink.html#bulk-encoded-formats),
 including Avro 
([FLINK-11395](https://issues.apache.org/jira/browse/FLINK-11395)), (Proto) 
Parquet ([FLINK-11427](https://issues.apache.org/jira/browse/FLINK-11427)) and 
Orc ([FLINK-10114](https://issues.apache.org/jira/browse/FLINK-10114)).
+
+```sql
+CREATE TABLE my_table (
+  column_name1 INT,
+  column_name2 STRING,
+  ...
+  part_name1 INT,
+  part_name2 STRING
+) PARTITIONED BY (part_name1, part_name2) WITH (
+  'connector' = 'filesystem',         
+  'path' = 'file:///path/to/file,
+  'format' = '...',  -- supported formats: Avro, ORC, Parquet, CSV, JSON       
  
+  ...
+);
+
+```
+
+The new all-rounder FileSystem Connector transparently handles batch and 
streaming execution, provides exactly-once guarantees and has full partition 
support, greatly expanding the scope of usage of the [legacy connector]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/connect.html#file-system-connector). This 
allows users to easily implement common use cases like **directly streaming 
data from Kafka to Hive**. 
+
+You can track the upcoming improvements to the FileSystem Connector in 
[FLINK-17778](https://issues.apache.org/jira/browse/FLINK-17778).
+
+### Table API/SQL: Support for Python UDFs
+
+Prior to this release, users of the Table API/SQL were limited to defining 
UDFs in either Java or Scala. In Flink 1.11, the community worked on expanding 
the usage scope of the Python language beyond PyFlink and providing support for 
Python UDFs in the SQL DDL syntax 
([FLIP-106](https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL)),
 as well as the SQL Client 
([FLIP-114](https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client)).
 Users can also register Python UDFs in the system catalog via SQL DDL or the 
Java Catalog API, so that functions can be shared between jobs.
+
+### Other Improvements to the Table API/SQL
+
+**DDL and DML Compatibility for the Hive Connector 
([FLIP-123](https://cwiki.apache.org/confluence/display/FLINK/FLIP-123%3A+DDL+and+DML+compatibility+for+Hive+connector))**
+
+Starting from Flink 1.11, users can write SQL statements directly using Hive 
syntax (HiveQL) in the Table API/SQL and the SQL Client. For this purpose, an 
additional dialect was introduced and users can now dynamically switch between 
Flink (``default``) and Hive (``hive``) on a per-statement basis. For a 
complete list of supported DDL and DML statements, check the Hive dialect 
[documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/hive/hive_dialect.html#hive-dialect).
+
+**Extensions and Improvements to the Flink SQL Syntax**
+
+* Flink 1.11 introduces the concept of [primary key constraints]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/sql/create.html#create-table) to leverage 
runtime optimizations in Flink SQL DDL 
([FLIP-87](https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API));
+
+* View objects are now fully supported in SQL DDL using the 
``CREATE``/``ALTER``/``DROP VIEW`` statements 
([FLIP-71](https://cwiki.apache.org/confluence/display/FLINK/FLIP-71%3A+E2E+View+support+in+FLINK+SQL));
+
+* Users can now specify or override table options in their DQL/DML statements 
using [dynamic table 
options](https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/hints.html#dynamic-table-options)
 
([FLIP-113](https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL)).
+
+* To make connector properties less verbose and improve exception handling, 
some key properties have been refactored 
([FLIP-122](https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory)).
 This change does not break compatibility, so users can still use the old 
property keys.
+
+**New TableSource and TableSink Interfaces 
([FLIP-95](https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces))**
+
+Flink 1.11 introduces new table source and sink interfaces (resp. 
[``DynamicTableSource``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/DynamicTableSource.html)
 and [``DynamicTableSink``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/table/connector/sink/DynamicTableSink.html))
 that unify batch and streaming execution, provide more efficient data 
processing with the Blink planner and offer support for handling changelogs 
(see _[Support for Change Data Capture 
(CDC)](#table-apisql-support-for-change-data-capture-cdc)_). The new interfaces 
also make it easier for users to [implement custom connectors]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/sourceSinks.html#full-stack-example) or 
modify existing ones. For an end-to-end example on how to implement a custom 
scan table source with a decoding format that supports changelog semantics, 
check out the [documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/sourceSinks.html#full-stack-example).
+
+<span class="label label-info">Note</span> Although compatibility is not 
immediately affected, we recommend that Table API/SQL users update any sources 
and sinks to the new interface stack.
+
+**Refactored TableEnvironment Interface 
([FLIP-84](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878))**
+
+The semantics to describe similar behaviours in the ``TableEnvironment`` and 
``Table`` interfaces have diverged over time, leading to an inconsistent and 
sometimes unclear user experience. To improve this and make programming more 
fluent in the Table API/SQL, Flink 1.11 introduces new methods that unify 
behaviours like execution triggering (e.g. ``executeSql()``) and result 
representation (e.g. [``print()``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/table/api/TableResult.html#print--),
 [``collect()``]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/api/java/org/apache/flink/table/api/TableResult.html#collect--)),
 and also lay the groundwork for important features like [multi-statement 
execution 
support](https://lists.apache.org/thread.html/r076e63bf6c8ed42d1b9ed2b406029696274a3a90cc360bc3a03e65d2%40%3Cdev.flink.apache.org%3E)
 in future releases.
+
+<span class="label label-info">Note</span> The methods deprecated with FLIP-84 
will not be immediately removed, but we recommend that users adopt the newly 
introduced methods. For a complete list of new and deprecated methods, check 
the “Summary” section of 
[FLIP-84](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878).
+
+**New Type Inference for Table API UDFs 
([FLIP-65](https://cwiki.apache.org/confluence/display/FLINK/FLIP-65%3A+New+type+inference+for+Table+API+UDFs))**
+
+In Flink 1.9, the community started working on a [new data type system]({{ 
site.DOCS_BASE_URL }}flink-docs-release-1.11/dev/table/types.html#data-types) 
for the Table API to improve its compliance with the SQL standard 
([FLIP-37](https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System)).
 This work is now close to being completed in Flink 1.11, with the exposure of 
Table API UDFs to the new type system (scalar and table functions, with 
aggregate functions planned for the next release).
+
+<hr>
+
+### PyFlink: Support for Pandas UDFs
+
+Up to this release, Python UDFs in PyFlink only supported scalar values of 
standard Python types. This presented some limitations:
+
+ * High serialization/deserialization overhead in the process of transferring 
data between the JVM and the Python processes;
+
+ * Hard to integrate with common Python libraries for high-performance 
numerical processing like pandas and NumPy.
+
+To overcome these limitations, the community introduced support for (scalar) 
**vectorized Python UDFs** based on 
[pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html)
 in Flink 1.11 
([FLIP-97](https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink)).
 The performance of vectorized UDFs is usually much higher, as the 
serialization/deserialization overhead is minimized by falling back to [Apache 
Arrow](https://arrow.apache.org/); and handling ``pandas.Series`` as 
input/output allows to take full advantage of the pandas and NumPy libraries. 
This makes Pandas UDFs a popular solution to parallelize Machine Learning and 
other large-scale, distributed data science workloads (e.g. feature 
engineering, distributed model application).
+
+```python
+@udf(input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()], 
result_type=DataTypes.BIGINT(), udf_type="pandas")
+def add(i, j):
+  return i + j
+```
+
+To mark a UDF as a Pandas UDF, you only need to add an extra parameter 
``udf_type=”pandas”`` in the udf decorator, as described in the 
[documentation](https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/vectorized_python_udfs.html#vectorized-user-defined-functions).
+
+### Other Improvements to PyFlink
+
+**Conversion fromPandas/toPandas 
([FLIP-120](https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame))**
+
+Arrow is also supported as an optimization to convert between PyFlink tables 
and 
[``pandas.DataFrames``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html),
 enabling users to switch processing engines seamlessly without the need for an 
intermediate connector. For examples on how to use the new ``fromPandas()`` and 
``toPandas()`` methods in PyFlink, check out the [documentation]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/python/conversion_of_pandas.html#conversions-between-pyflink-table-and-pandas-dataframe).
+
+**Support for User-defined Table Functions (UDTFs) 
([FLINK-14500](https://jira.apache.org/jira/browse/FLINK-14500))**
+
+From Flink 1.11, you can define and register custom [UDTFs]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/dev/table/python/python_udfs.html#table-functions) in 
PyFlink. Similar to a Python UDF, a UDTF takes zero, one or multiple scalar 
values as input, but can return an arbitrary number of rows as output instead 
of a single value.
+
+**Cython Performance Optimization for UDFs 
([FLIP-121](https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function))**
+
+[Cython](https://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html)
 is a compiled superset of the Python language that is often used to improve 
the performance of large-scale numeric processing in Python, as it optimizes 
execution to machine code-level speed and pairs well with popular C-based 
libraries like NumPy. From Flink 1.11, you can build [PyFlink with Cython 
support]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/flinkDev/building.html#build-pyflink) and “Cythonize” 
your Python UDFs to substantially improve code execution speed (up to 30x 
faster, compared to Python UDFs in Flink 1.10).
+
+**User-defined Metrics in Python UDFs 
([FLIP-112](https://cwiki.apache.org/confluence/display/FLINK/FLIP-112%3A+Support+User-Defined+Metrics+in++Python+UDF))**
+
+To make it easier for users to monitor and debug the execution of Python UDFs, 
PyFlink now allows gathering and exposing metrics to external systems, as well 
as defining user scopes and variables. You can access the metrics system from a 
UDF by calling ``function_context.get_metric_group()`` in the open method, as 
described in the 
[documentation](https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/metrics.html#registering-metrics).
+
+<hr>
+
+## Important Changes
+
+ * [[FLINK-17339](https://jira.apache.org/jira/browse/FLINK-17339)] The Blink 
planner is the **default** in the Table API/SQL starting from Flink 1.11. This 
was already the case for the SQL Client since Flink 1.10. The old Flink planner 
is still supported, but not actively developed.
+
+ * [[FLINK-5763](https://issues.apache.org/jira/browse/FLINK-5763)] Savepoints 
now contain all their state inside a single directory (both metadata and 
program state). This makes it straightforward to figure out which files make up 
the state of a savepoint and allows users to **relocate savepoints** by simply 
moving a directory.
+
+ * [[FLINK-16408](https://issues.apache.org/jira/browse/FLINK-16408)] To 
reduce pressure on the JVM metaspace, the user code class loader is being 
reused by a ``TaskExecutor`` as long as there is at least a single slot 
allocated for the respective job. This changes Flink's recovery behaviour 
slightly, so that it will not reload static fields.
+
+* [[FLINK-11086](https://issues.apache.org/jira/browse/FLINK-11086)] Flink now 
supports Hadoop versions above **Hadoop 3.0.0**. Note that the Flink project 
does not provide any updated "flink-shaded-hadoop-\*" jars. Users need to 
provide Hadoop dependencies through the ``HADOOP_CLASSPATH`` environment 
variable (recommended) or the lib/ folder.
+
+* [[FLINK-16963](https://issues.apache.org/jira/browse/FLINK-16963)] All 
``MetricReporters`` that come with Flink have been converted to plugins. These 
should no longer be placed into ``/lib`` (which may result in dependency 
conflicts), but ``/plugins/<some_directory>`` instead.
+
+* [[FLINK-12639](https://issues.apache.org/jira/browse/FLINK-12639)] The Flink 
**documentation** is undergoing some **rework**, so you might notice that the 
navigation and organization of content look slightly different starting from 
Flink 1.11.
+
+
+## Release Notes
+
+Please review the [release notes]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/release-notes/flink-1.11.html) carefully for a 
detailed list of changes and new features if you plan to upgrade your setup to 
Flink 1.11. This version is API-compatible with previous 1.x releases for APIs 
annotated with the @Public annotation.

Review comment:
       I would propose to add a link to the release notes to the section on the 
top where we mention the Downloads page, release changelog and updated docs.
   Given Thomas experience during the release testing, it seems that this 
document is important for existing users!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] rmetzger commented on a change in pull request #352: Add 1.11 Release announcement.

Reply via email to