MarkSfik commented on a change in pull request #438:
URL: https://github.com/apache/flink-web/pull/438#discussion_r624998785
##########
File path: _posts/2021-04-22-release-1.13.0.md
##########
@@ -0,0 +1,590 @@
+---
+layout: post
+title: "Apache Flink 1.13.0 Release Announcement"
+date: 2021-04-22T08:00:00.000Z
+categories: news
+authors:
+- stephan:
+ name: "Stephan Ewen"
+ twitter: "StephanEwen"
+- dwysakowicz:
+ name: "Dawid Wysakowicz"
+ twitter: "dwysakowicz"
+
+excerpt: The Apache Flink community is excited to announce the release of
Flink 1.13.0! Around 200 contributors worked on over 1,000 issues to bring
significant improvements to usability and observability as well as new features
that improve elasticity of Flink’s Application-style deployments.
+---
+
+
+The Apache Flink community is excited to announce the release of Flink 1.13.0!
More than 200
+contributors worked on over 1,000 issues for this new version.
+
+The release brings us a big step forward in one of our major efforts: **Making
Stream Processing
+Applications as natural and as simple to manage as any other application.**
The new *reactive scaling*
+mode means that scaling streaming applications in and out now works like in
any other application,
+by just changing the number of parallel processes.
+
+The release also prominently features a **series of improvements that help
users better understand the performance of
+applications.** When the streams don't flow as fast as you'd hope, these can
help you to understand
+why: Load and *backpressure visualization* to identify bottlenecks, *CPU flame
graphs* to identify hot
+code paths in your application, and *State Access Latencies* to see how the
State Backends are keeping
+up.
+
+Beyond those features, the Flink community has added a ton of improvements all
over the system,
+some of which we discuss in this article. We hope you enjoy the new release
and features.
+Towards the end of the article, we describe changes to be aware of when
upgrading
+from earlier versions of Apache Flink.
+
+{% toc %}
+
+We encourage you to [download the
release](https://flink.apache.org/downloads.html) and share your
+feedback with the community through
+the [Flink mailing
lists](https://flink.apache.org/community.html#mailing-lists)
+or [JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+----
+
+# Notable Features
+
+## Reactive Scaling
+
+Reactive Scaling is the latest piece in Flink's initiative to make Stream
Processing
+Applications as natural and as simple to manage as any other application.
+
+Flink has a dual nature when it comes to resource management and deployments:
You can deploy
+Flink applications onto resource orchestrators like Kubernetes or Yarn in such
a way that Flink actively manages
+the resources, and allocates and releases workers as needed. That is
especially useful for jobs and
+applications that rapidly change their required resources, like batch
applications and ad-hoc SQL
+queries. The application parallelism rules, the number of workers follows. In
the context of Flink
+applications, we call this *active scaling*.
+
+For long running streaming applications, it is often a nicer model to just
deploy them like any
+other long-running application: The application doesn't really need to know
that it runs on K8s,
+EKS, Yarn, etc. and doesn't try to acquire a specific amount of workers;
instead, it just uses the
+number of workers that is given to it. The number of workers rules, the
application parallelism
+adjusts to that. In the context of Flink, we call that *re-active scaling*.
+
+The [Application Deployment Mode]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/concepts/flink-architecture/#flink-application-execution)
+started this effort, making deployments more application-like (by avoiding two
separate deployment
+steps to (1) start cluster and (2) submit application). The reactive scaling
mode completes this,
+and you now don't have to use extra tools (scripts or a K8s operator) any more
to keep the number
+of workers and the application parallelism settings in sync.
+
+You can now put an auto-scaler around Flink applications like around other
typical applications — as
+long as you are mindful about the cost of rescaling, when configuring the
autoscaler: Stateful
+streaming applications must move state around when scaling.
+
+To try the reactive-scaling mode, add the `scheduler-mode: reactive` config
entry and deploy
+an application cluster ([standalone]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/deployment/resource-providers/standalone/overview/#application-mode)
or [Kubernetes]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/#deploy-application-cluster)).
Check out [the reactive scaling docs]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/deployment/elastic_scaling/#reactive-mode) for
more details.
+
+
+## Analyzing Application Performance
+
+Like for any application, analyzing and understanding the performance of a
Flink application
+is critical. Often event more critical, because Flink applications are
typically data-intensive
+(process high volumes of data) and are at the same time expected to provide
results within
+(near-) real-time latencies.
+
+When an application doesn't keep up with the data rate any more, or an
application takes more
+resources than you'd expect it would, these new tools can help you track down
the causes:
+
+**Bottleneck detection, Back Pressure Monitoring**
+
+The first question during performance analysis is often: Which operation is
the bottleneck?
+
+To help answer that, Flink exposes metrics about the degree to which tasks are
*busy* (doing work)
+and *back-pressured* (have capacity to do work, but cannot, because their
successor operators
+are cannot accept more results). Candidates for bottlenecks are the busy
operators whose predecessors
+are back-pressured.
+
+Flink 1.13 brings an improved back pressure metric system (using task mailbox
timings, rather than
+thread stack sampling) and a reworked graphical representation of the job's
dataflow with color coding
+and ratios for busyness and back pressure.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-04-xx-release-1.13.0/bottleneck.png" style="width: 900px"/>
+</figure>
+
+**CPU flame graphs in Web UI**
+
+The next question during performance analysis is typically: What part of work
in the bottleneck
+operator is expensive?
+
+One visually effective means to investigate that are *Flame Graphs*. They help
answer question like:
+ - Which methods are currently consuming CPU resources?
+ - How does one method's CPU consumption compare to other methods?
+ - Which series of calls on the stack led to executing a particular method?
+
+Flame Graphs are constructed by repeatedly sampling the thread stack traces.
Every method call is
+represented by a bar, where the length of the bar is proportional to the
number of times it is present
+in the samples. When enabled, the graphs are shown in a new UI component for
the selected operator.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl }}/img/blog/2021-04-xx-release-1.13.0/7.png"
style="display: block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+Flame graphs are expensive to create: They may cause processing overhead and
can put a heavy load
+on Flink's metric system. Because of that, users need to explicitly enabled
them in the configuration.
+
+**Access Latency Metrics for State**
+
+Another possibly performance bottleneck can be the state backend, especially
when your state is larger
+than the main memory available to Flink and you are using the [RocksDB state
backend](
+{{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/ops/state/state_backends/#the-embeddedrocksdbstatebackend).
+
+That's not saying RocksDB is slow (we love RocksDB!), but it has some
requirements to achieve
+good performance. For example, it is easy to accidentally [starve RocksDB's
demand for IOPs on cloud setups with
+the wrong type of disk
resources](https://www.ververica.com/blog/the-impact-of-disks-on-rocksdb-state-backend-in-flink-a-case-study).
+
+On top of the CPU flame graphs, the new *state backend latency metrics* can
help you understand whether
+your state backend is responsive. For example, if you see that RocksDB state
accesses start to take
+milliseconds, you probably need to look into your memory and I/O configuration.
+These metrics can be activated by setting the
`state.backend.rocksdb.latency-track-enabled` option.
+The metrics are sampled and their collection should have a marginal impact on
the RocksDB state
+backend performance.
+
+## Switching State Backend with savepoints
+
+You can now change the state backend of a Flink application when resuming from
a savepoint.
+That means the application's state is no longer locked into the state backend
that was used when
+the application was initially started.
+
+This makes it possible, for example, to initially start with the HashMap State
Backend (pure
+in-memory in JVM Heap) and later switch to the RocksDB State Backend, once the
state grows
+too large.
+
+Under the hood, Flink now has a canonical savepoint format, which all state
backends use when
+creating a data snapshot for a savepoint.
+
+## User-specified pod templates for Kubernetes Deployments
+
+The [native Kubernetes deployment]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/)
+(where Flink actively talks to K8s to start and stop pods) now supports
*custom pod templates*.
+
+With those templates, users can setup and configure the JobManagers and
TaskManagers pods in a
+Kubernetes-y way, with flexibility beyond the configuration options that are
directly built into
+Flink's Kubernetes integration.
+
+## Unaligned Checkpoints - Production Ready
+
+Unaligned Checkpoints have matured to the point where we encourage all users
to try them out,
+if they see issues with their application under backpressure.
+
+In particular, these changes make Unaligned Checkpoints more easily usable:
+
+ - You can now rescale applications from an unaligned checkpoints. This comes
in handy, if your
+ application needs to be scaled from a retained checkpoint, because you
cannot (afford to) create
+ a savepoint.
+
+ - Enabling unaligned checkpoints is cheaper for applications that are not
back-pressured.
+ Unaligned checkpoints can now trigger adaptively with a timeout, meaning a
checkpoint starts
+ as an aligned checkpoint (not storing any in-flight events) and falls back
to an unaliged
+ checkpoint (storing some in-flight events), if the alignment phase takes
longer than a certain
+ time.
+
+Find out more about how to enable unaligned checkpoints in the [Checkpointing
Documentation]({{ site.DOCS_BASE_URL
}}flink-docs-release-1.13/docs/ops/state/checkpoints/#unaligned-checkpoints).
+
+## Machine Learning Library moving to a separate Repository
+
+To accelerate the development of Flink's Machine Learning efforts (streaming,
batch, and
+unified machine learning), the effort has moved to the new repository
[flink-ml](https://github.com/apache/flink-ml)
+under the Flink project. We here follow a similar approach like the *Stateful
Functions* effort,
+where a separate repository has helped to speed up the development by allowing
for more light-weight
+contribution workflows and separate release cycles.
+
+Stay tuned for more updates in the Machine Learning efforts, like the
interplay of with
+[ALink](https://github.com/alibaba/Alink) (suite of many common Machine
Learning Algorithms on Flink)
+or the [Flink & TensorFlow
integration](https://github.com/alibaba/flink-ai-extended).
+
+
+# Notable SQL & Table API Improvements
+
+Like in previous releases, SQL and the Table API remain an area of big
developments.
+
+## Windows via Table-valued Functions
+
+Defining time windows is one of the most frequent operations in streaming SQL
queries.
+Flink 1.13 introduces a new way to define windows: via so-called *Table-valued
Functions*.
+This approach is both more expressive (let's you define new types of windows)
and fully
+in line with the SQL standard.
+
+Flink 1.13 already supports *TUMBLE* and *HOP* windows in the new syntax,
*SESSION* windows will
+follow in a subsequent release. To demonstrate the increased expressiveness,
consider the two examples
+below.
+
+A new *CUMULATE* window function, that assigns windows with an expanding step
size until the maximum
+window size is reached:
+
+```sql
+SELECT window_time, window_start, window_end, SUM(price) AS total_price
+ FROM TABLE(CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES,
INTERVAL '10' MINUTES))
+GROUP BY window_start, window_end, window_time;
+```
+
+You can reference the window start and window end time of the table-valued
window functions,
+making new types of constructs possible. Beyond regular windowed aggregations
and windowed joins,
+you can, for example, now express windowed Top-K aggregations:
+
+```sql
+SELECT window_time, ...
+ FROM (
+ SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER
BY total_price DESC)
+ as rank
+ FROM t
+ ) WHERE rank <= 100;
+```
+
+## Improved interoperability between DataStream API and Table API/SQL
+
+This release radically simplifies mixing DataStream API and Table API programs.
+
+The Table API is a great way to develop applications, with its delarative
nature and its
+many built-in functions. But sometimes you need to *escape* to the DataStream
API for its
+expressiveness, flexibility, and explicit control over state.
+
+The new methods `StreamTableEnvironment.toDataStream()/.fromDataStream()` can
model
+a `DataStream` from the DataStream API as a table source or sink. Types are
automatically
+converted, event-time and watermarks carry across. In addition, the `Row`
class (representing
+row events from the Table API) has received a major overhaul (improving the
behavior of
+`toString()`/`hashCode()`/`equals()` methods) and now supports accessing
fields by name, with
+support for sparse representations.
+
+```java
+Table table=tableEnv.fromDataStream(
+ dataStream,Schema.newBuilder()
+ .columnByMetadata("rowtime","TIMESTAMP(3)")
+ .watermark("rowtime","SOURCE_WATERMARK()")
+ .build());
+
+DataStream<Row> dataStream=tableEnv.toDataStream(table)
+ .keyBy(r->r.getField("user"))
+ .window(...)
+```
+
+## SQL Client: Init Scripts and Statement Sets
+
+The SQL Client is a convenient way to run and deploy SQL streaming and batch
jobs directly,
+without writing any code, from the command line, or as part of a CI/CD
workflow.
+
+This release vastly improves the functionality of the SQL client. Quasi all
operations as that
+are available to Java applications (when programmatically launching queries
from the
+`TableEnvironment`) are now supported in the SQL Client and as SQL scripts.
+That means SQL users need much less glue code for their SQL deployments.
+
+**Easier Configuration and Code Sharing**
+
+The support of YAML files to configure the SQL Client will be discontinued.
Instead, the client
+accepts one or more *initialization scripts* to configure a session, before
the main SQL script
+gets executed.
+
+These init scripts would typically be shared across teams/deployments and
could be used for
+loading common catalogs, applying common configuration settings, or defining
standard views.
+
+```
+./sql-client.sh -i init1.sql init2.sql -f sqljob.sql
+```
+
+**More config options**
+
+A greater set of recognized config options, and improved `SET`/`RESET`
commands make it easier to
+define and control the execution from within the SQL client and SQL scripts.
+
+**Multi-query Support with Statement Sets**
+
+Multi-query execution lets you execute multiple SQL queries (or statements) as
a single Flink job.
+This is particularly useful for streaming SQL queries that run indefinitely.
+
+*Statement Sets* are the mechanism to group the queries together that should
be executed together.
+
+The following is an example of a SQL script that can be run via the SQL
client. It sets up and
+configures the environment, and executes multiple queries. The script captures
end-to-end the
+queries and all environment setup and configuration work, making it a
self-contained deployment
+artifact.
+
+```sql
+-- set up a catalog
+CREATE CATALOG hive_catalog WITH ('type' = 'hive');
+USE CATALOG hive_catalog;
+
+-- or use temporary objects
+CREATE TEMPORARY TABLE clicks (
+ user_id BIGINT,
+ page_id BIGINT,
+ viewtime TIMESTAMP
+) WITH (
+ 'connector' = 'kafka',
+ 'topic' = 'clicks',
+ 'properties.bootstrap.servers' = '...',
+ 'format' = 'avro'
+);
+
+-- set the execution mode for jobs
+SET execution.runtime-mode=streaming;
+
+-- set the sync/async mode for INSERT INTOs
+SET table.dml-sync=false;
+
+-- set the job's parallelism
+SET parallism.default=10;
+
+-- set the job name
+SET pipeline.name = my_flink_job;
+
+-- restore state from the specific savepoint path
+SET execution.savepoint.path=/tmp/flink-savepoints/savepoint-bb0dab;
+
+BEGIN STATEMENT SET;
+
+INSERT INTO pageview_pv_sink
+SELECT page_id, count(1) FROM clicks GROUP BY page_id;
+
+INSERT INTO pageview_uv_sink
+SELECT page_id, count(distinct user_id) FROM clicks GROUP BY page_id;
+
+END;
+```
+
+## Hive Query syntax compatibility
+
+You can now write SQL queries against Flink using the Hive SQL syntax.
+In addition to Hive's DDL dialect, Flink now also accepts the commonly-used
Hive DML and DQL
+dialects.
+
+To use the Hive SQL dialect, set `table.sql-dialect` to `hive`, and load the
`HiveModule`.
+The latter is important, because Hive's built-in functions are required for
proper syntax and
+semantics compatibility. The following example illustrates that:
+
+```sql
+CREATE CATALOG myhive WITH ('type' = 'hive'); -- setup HiveCatalog
+USE CATALOG myhive;
+LOAD MODULE hive; -- setup HiveModule
+USE MODULES hive,core;
+SET table.sql-dialect = hive; -- enable Hive dialect
+SELECT key, value FROM src CLUSTER BY key; -- run some Hive queries
+```
+
+Please note that the Hive dialect no longer supports Flink's SQL syntax for
DML and DQL statements.
+Switch back to the `default` dialect to for Flink's syntax.
+
+## Improved behavior of SQL time functions
+
+Working with time is a crucial element of any data processing. But
simultaneously, handling different
+time zones, dates and times is an [increadibly delicate
task](https://xkcd.com/1883/) when working with data.
+
+In Flink 1.13. we put a lot of effort in simplifying the usage of time related
functions. We adjusted (made
+more specific) the return types of functions such as: `PROCTIME()`,
`CURRENT_TIMESTAMP`, `NOW()`.
+
+Moreover, you can now also define an event time attribute on a *TIMESTAMP_LTZ*
column, to gracefully
+do window processing with the support of Daylight Saving Time.
+
+Please see the release notes for a complete list of changes.
+
+---
+
+# Notable PyFlink Improvements
+
+The general theme of this release in PyFlink is to bring the Python DataStream
API and Table API
+closer to feature parity with the Java/Scala APIs.
+
+### Stateful Operations in the Python DataStream API
+
+With Flink 1.13, Python programmers now also get to enjoy the full potential
of Apache Flink's
+stateful stream processing APIs. The rearchitected Python DataStream API,
introduced in Flink 1.12,
+now has full stateful capabilities, allowing users to remember information
from events in state
+and act on it later.
+
+That stateful processing capability is the basis of many of the more
sophisticated processing
+operations, which need to remember information across individual events (for
example Windowing
+Operations).
+
+This example shows a custom counting window implementation, using state:
+
+```python
+class CountWindowAverage(FlatMapFunction):
+ def __init__(self, window_size):
+ self.window_size = window_size
+
+ def open(self, runtime_context: RuntimeContext):
+ descriptor = ValueStateDescriptor("average",
Types.TUPLE([Types.LONG(), Types.LONG()]))
+ self.sum = runtime_context.get_state(descriptor)
+
+ def flat_map(self, value):
+ current_sum = self.sum.value()
+ if current_sum is None:
+ current_sum = (0, 0)
+ # update the count
+ current_sum = (current_sum[0] + 1, current_sum[1] + value[1])
+ # if the count reaches window_size, emit the average and clear the
state
+ if current_sum[0] >= self.window_size:
+ self.sum.clear()
+ yield value[0], current_sum[1] // current_sum[0]
+ else:
+ self.sum.update(current_sum)
+
+ds = ... # type: DataStream
+ds.key_by(lambda row: row[0]) \
+ .flat_map(CountWindowAverage(5))
+```
+
+### User-defined Windows in the PyFlink DataStream API
+
+Flink 1.13 adds support for user-defined windows to the PyFlink DataStream
API. Programs can now use
+windows beyond the standard window definitions.
+
+Because windows are at the heart of all programs that process unbounded
streams (by splitting the
+stream into "buckets" of bounded size), this greatly increases the
expressiveness of the the API.
+
+### Row-based operation in the PyFlink Table API
+
+The Python Table API now supports row-based operations, i.e., custom
transformation functions on rows.
+These functions are an easy way to apply data transformations on tables,
beyond the built-in functions.
+
+This is an example of using a `map()` operation in Python Table API:
+```python
+@udf(result_type=DataTypes.ROW(
+ [DataTypes.FIELD("c1", DataTypes.BIGINT()),
+ DataTypes.FIELD("c2", DataTypes.STRING())]))
+def increment_column(r: Row) -> Row:
+ return Row(r[0] + 1, r[1])
+
+table = ... # type: Table
+mapped_result = table.map(increment_column)
+```
+
+In addition to `map()`, the API also supports `flat_map()`, `aggregate()`,
`flat_aggregate()`,
+and other row-based operations. This brings the Python Table API a big step
closer to feature
+parity with the Java Table API.
+
+### Batch execution mode for PyFlink DataStream programs
+
+The PyFlink DataStream API now also supports the batch execution mode for
bounded streams,
+which was introduced for the Java DataStream API in Flink 1.12.
+
+The batch execution mode simplifies operations and improves performance of
programs on bounded streams,
+by exploiting the bounded stream nature to bypass state backends and
checkpoints.
+
+# Other improvements
+
+**Flink Documentation via Hugo**
+
+The Flink Documentation has been migrated from Jekyll to Hugo. If you find
something missing, please let us know.
+We are also curious to hear if you like the new look & feel.
+
+**Exception histories in the Web UI**
+
+The Flink Web UI will present up to *n* last exceptions that caused a job to
fail.
+That helps to debug scenarios where a root failure caused subsequent failures.
The root failure
+cause can be found in the exception history.
+
+**Better exception / failure-cause reporting for unsuccessful checkpoints**
+
+Flink now provides statistics for checkpoints that failed or were aborted, to
make it easier
+to determine the failure cause without having to analyze the logs.
+
+Prior versions of Flink were reporting metrics (e.g., size of persisted data,
trigger time)
+only in case a checkpoint succeeded.
+
+**Exactly-once JDBC sink**
+
+From 1.13, JDBC sink can guarantee exactly-once delivery of results for
XA-compliant databases,
+by transactionally committing results on checkpoints. The target database must
have (or be linked
+to) an XA Transaction Manager.
+
+The connector exists currently only for the *DataStream API*, and can be
created through the
+`JdbcSink.exactlyOnceSink(...)` method (or by instantiating the
`JdbcXaSinkFunction` directly).
+
+**PyFlink Table API supports User-Defined Aggregate Functions in Group
Windows**
+
+Group Windows in PyFlink's Table API now support both general Python
User-defined Aggrgate
+Functions (UDAFs) and Pandas UDAFs. Such functions are critical to many
analysis- and ML training
+programs.
+
+Flink 1.13 here improves upon previous releases, where these functions were
only supported
+in unbounded GRoup-By aggregations.
+
+**Improved Sort-Merge Shuffle for Batch Execution**
+
+Flink 1.13 improves the memory stability and performance of the *sort-merge
blocking shuffle*
+for batch-executed programs, originally introduced in Flink 1.12 via
[FLIP-148](https://cwiki.apache.org/confluence/display/FLINK/FLIP-148%3A+Introduce+Sort-Merge+Based+Blocking+Shuffle+to+Flink).
+
+Programs with higher parallelism (1000s) should no longer frequently trigger
*OutOfMemoryError: Direct Memory*.
+The performance (especially on spinning disks) is improved through better I/O
scheduling
+and broadcast optimizations.
+
+**HBase connector supports async lookup and lookup cache**
+
+The HBase Lookup Table Source now supports an *async lookup mode* and a lookup
cache.
+This greatly benefits the performance of Table/SQL jobs with lookup joins
against HBase, while
+reducing the I/O requests to HBase in the common case.
+
+In prior versions, the HBase Lookup Source only communicated synchronously,
resulting in lower
+pipeline utilization and throughput.
+
+# Change to consider when upgrading to Flink 1.13
Review comment:
```suggestion
# Changes to consider when upgrading to Flink 1.13
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]