sjwiesman commented on a change in pull request #468:
URL: https://github.com/apache/flink-web/pull/468#discussion_r714896998
##########
File path: _posts/2021-09-21-release-1.14.0.md
##########
@@ -0,0 +1,274 @@
+---
+layout: post
+title: "Apache Flink 1.14.0 Release Announcement"
+date: 2021-09-21T08:00:00.000Z
+categories: news
+authors:
+- joemoe:
+ name: "Johannes Moser"
+
+excerpt: The Apache Flink community is excited to announce the release of
Flink 1.14.0! Around xxx contributors worked on over xxxx issues to TODO.
+---
+
+Just a couple of days ago the Apache Software Foundation announced its annual
report and Apache
+Flink was again in the Top 5 of the most active projects in all relevant
categories. This remarkable
+activity is also reflected in this new 1.14.0 release. Once again, more than
200 contributors worked on
+over 1,000 issues. We are proud of how this community is consistently moving
the project forward.
+
+The release brings many cool improvements, from SQL to connectors,
checkpointing, and PyFlink.
+A big area of changes in this release is the integrated streaming & batch
experience. We believe
+that unbounded stream processing goes hand-in-hand with bounded- and batch
processing tasks in practice.
+Data exploration when developing new applications, bootstrapping state for new
applications, training
+models to be applied in the streaming application, re-processing data after
fixes/upgrades, and many
+other use cases require processing historic data from various sources next to
the streaming data.
+
+In Flink 1.14, we finally made it possible to **mix bounded and unbounded
streams in an application**;
+Flink can now take checkpoints of applications that is partially running and
partially finished (some
+operators reached the end of the bounded inputs). Additionally, **bounded
streams now take a final checkpoint**
+when reaching their end to ensure smooth committing of results in sinks.
+The **batch execution mode now works for programs that mix DataStream &
Table/SQL** (previously only
+pure Table/SQL or DataStream programs). The unified Source and Sink APIs have
made strides ahead,
+we started **consolidating the connector ecosystem around the unified APIs**,
and added a **hybrid source**
+that can bridge between multiple storage systems, like start reading old data
from S3 and switch over
+to Kafka later.
+
+Furthermore, this release takes another step in our initiative of making Flink
more self-tuning and
+to require less Stream-Processor-specific knowledge to operate. We started
that initiative in the previous
+release with [Reactive
Scaling](/news/2021/05/03/release-1.13.0.html#reactive-scaling) and are now
+adding **automatic network memory tuning** (*a.k.a. Buffer Debloating*). This
feature speeds up checkpoints
+under load without sacrificing performance or increasing checkpoint size, by
continuously adjusting the
+network buffering to ensure best throughput while having minimal in-flight
data. See the
+[Buffer Debloating section](#buffer-debloating) for details.
+
+There are many more improvements and new additions throughout various
components, as we discuss below.
+We also had to say goodbye to some features that have been superceded by newer
ones in recent releases,
+most prominently we are **removing the old SQL execution engine**.
+
+We hope you like the new release and we'd be eager to learn about your
experience with it, which yet
+unsolved problems it solves, what new use-cases it unlocks for you.
+
+{% toc %}
+
+# Unified Batch and Stream Processing experience
+
+One of Flink's unique characteristics is how it integrates streaming and batch
processing,
+using common unified APIs, and a runtime that supports multiple execution
paradigms.
+
+As motivated in the introduction, we believe Streaming and Batch always go
hand in hand. This quote from
+a [report on Facebook's streaming
infrastructure](https://research.fb.com/wp-content/uploads/2016/11/realtime_data_processing_at_facebook.pdf)
+echos this sentiment nicely.
+
+> Streaming versus batch processing is not an either/or decision. Originally,
all data warehouse
+> processing at Facebook was batch processing. We began developing Puma and
Swift about five years
+> ago. As we showed in Section [...], using a mix of streaming and batch
processing can speed up
+> long pipelines by hours.
+
+Having both the real-time and the historic computations in the same engine
also ensures consistency
+between semantics and makes results well comparable. Here is an [article by
Alibaba](https://www.ververica.com/blog/apache-flinks-stream-batch-unification-powers-alibabas-11.11-in-2020)
+about unifying business reporting with Apache Flink and getting consistent
reports that way.
+
+While unified streaming & batch are already possible in earlier versions, this
release bring a
+some features that unlock new use cases, as well as a series of
quality-of-life improvements.
+
+## Checkpointing and Bounded Streams
+
+Flink's checkpointing mechanism could originally only draw checkpoints when
all tasks in an application's
+DAG were running. That means that mixed bounded/unbounded applications were
not really possible.
+In addition, applications on bounded inputs that were executed in a streaming
way (not in a batch way)
+stopped checkpointing towards the end, when some tasks finished. Lingering
data for exactly-once
+sinks was the result, because in the absence of checkpoints, the latest output
data was not committed.
+
+With
[FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished)
+Flink now supports checkpoints after tasks are finished, and takes a final
checkpoint at the end of a
+bounded stream, ensuring that all sink data is committed before the job ends
(similar to how
+*stop-with-savepoint* behaves).
+
+To activate this feature, add
`execution.checkpointing.checkpoints-after-tasks-finish.enabled: true`
+to your configuration. Keeping with the tradition of making big new features
opt-in for the first release,
+the feature is not active by default in Flink 1.14. We expect it to become the
default in the next release.
+
+As a side note: Why would we even want to execute applications over bounded
streams in a streaming way,
+rather than in a batch-y way? There can be many reasons, like (a) the sink
that you write to needs to be
+written to in a streaming way (like the Kafka Sink), or (b) that you do want
to exploit the
+streaming-inherent quasi-ordering-by-time, such as in the [Kappa+
Architecture](https://youtu.be/4qSlsYogALo?t=666).
+
+## Mixing Batch DataStream API and Table API/SQL
+
+SQL and the Table API are becoming the default starting points for new
projects. The declarative
+nature and richness of built-in types and operations make it easy to develop
applications fast.
+It is not uncommon, though, for developers to eventually hit the limit of
SQL's expressiveness for
+certain types of event-driven business logic (or hit the point when it becomes
grotesque to express
+that logic in SQL).
+
+At that point, the natural step is to blend in a piece of stateful DataStream
API logic, before
+switching back to SQL again.
+
+In Flink 1.14, bounded batch-executed SQL/Table programs can convert their
intermediate
+Tables to a DataStream, apply some DataSteam API operations, and convert it
back to a Table. Flink builds
+a dataflow DAG mixing declarative optimized SQL execution with batch-executed
DataStream logic.
+Check out the [relevant
docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/data_stream_api/#converting-between-datastream-and-table)
for details.
+
+## Hybrid Source
+
+The new [Hybrid
Source](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/)
+produces a combined stream from other sources, by reading those other sources
one after the other,
+seamlessly switching over from one to the other.
+
+With the Hybrid Source, you can read for example from tiered storage setups as
if there was one
+stream that spans everything. Consider the example below: New data lands in
Kafa and is eventually
+migrated to S3 (typically in compressed columnar format, for cost efficiency
and performance).
+The Hybrid Source can read a stream that starts all the way in the past on S3
and transitions over
+to the real-time data in Kafka.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/hybrid_source.png" style="display: block;
margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+We believe that this is an exciting step in realizing the full promise of logs
and the *Kappa Architecture.*
+Even if older parts of a log of events are physically migrated to different
storage
+(because cheaper, better compression, faster to read) you can still treat and
process it as one
+contiguous log.
+
+The Hybrid Source foundation is in Flink 1.14. Over the next releases, we
expect to add more
+utilities and patterns for typical switching strategies.
+
+## Consolidating Sources and Sink
+
+With the new unified (streaming/batch) source and sink APIs becoming stable,
we started the
+big effort to consolidate all connectors around those APIs. At the same time,
are
+better aligning connectors between DataStream and SQL/Table API. First are the
*Kafka* and
+*File* Soures and Sinks for the DataStream API.
+
+The result of this effort (that we expect to span at least 1-2 futher
releases) will be a much
+more smooth and consistent experience for Flink users when connecting to
external systems;
+something were we need to acknowledge that there is room for improvement in
Flink.
+
+# Improvements to Operations
+
+## Buffer debloating
+
+*Buffer Debloating* is a new technology in Flink that automatically tunes the
usage
+of network memory to ensure throughput, while minimizing in-flight data and
thus minimizing
+checkpoint latency and cost.
+
+Apache Flink buffers a certain amount of data in its network stack to be able
to utilize the
+bandwidth of fast networks. A Flink application running under high trhoughput
uses some of
Review comment:
```suggestion
bandwidth of fast networks. A Flink application running under high
throughput uses some of
```
##########
File path: _posts/2021-09-21-release-1.14.0.md
##########
@@ -0,0 +1,274 @@
+---
+layout: post
+title: "Apache Flink 1.14.0 Release Announcement"
+date: 2021-09-21T08:00:00.000Z
+categories: news
+authors:
+- joemoe:
+ name: "Johannes Moser"
+
+excerpt: The Apache Flink community is excited to announce the release of
Flink 1.14.0! Around xxx contributors worked on over xxxx issues to TODO.
+---
+
+Just a couple of days ago the Apache Software Foundation announced its annual
report and Apache
+Flink was again in the Top 5 of the most active projects in all relevant
categories. This remarkable
+activity is also reflected in this new 1.14.0 release. Once again, more than
200 contributors worked on
+over 1,000 issues. We are proud of how this community is consistently moving
the project forward.
+
+The release brings many cool improvements, from SQL to connectors,
checkpointing, and PyFlink.
+A big area of changes in this release is the integrated streaming & batch
experience. We believe
+that unbounded stream processing goes hand-in-hand with bounded- and batch
processing tasks in practice.
+Data exploration when developing new applications, bootstrapping state for new
applications, training
+models to be applied in the streaming application, re-processing data after
fixes/upgrades, and many
+other use cases require processing historic data from various sources next to
the streaming data.
+
+In Flink 1.14, we finally made it possible to **mix bounded and unbounded
streams in an application**;
+Flink can now take checkpoints of applications that is partially running and
partially finished (some
+operators reached the end of the bounded inputs). Additionally, **bounded
streams now take a final checkpoint**
+when reaching their end to ensure smooth committing of results in sinks.
+The **batch execution mode now works for programs that mix DataStream &
Table/SQL** (previously only
+pure Table/SQL or DataStream programs). The unified Source and Sink APIs have
made strides ahead,
+we started **consolidating the connector ecosystem around the unified APIs**,
and added a **hybrid source**
+that can bridge between multiple storage systems, like start reading old data
from S3 and switch over
+to Kafka later.
+
+Furthermore, this release takes another step in our initiative of making Flink
more self-tuning and
+to require less Stream-Processor-specific knowledge to operate. We started
that initiative in the previous
+release with [Reactive
Scaling](/news/2021/05/03/release-1.13.0.html#reactive-scaling) and are now
+adding **automatic network memory tuning** (*a.k.a. Buffer Debloating*). This
feature speeds up checkpoints
+under load without sacrificing performance or increasing checkpoint size, by
continuously adjusting the
+network buffering to ensure best throughput while having minimal in-flight
data. See the
+[Buffer Debloating section](#buffer-debloating) for details.
+
+There are many more improvements and new additions throughout various
components, as we discuss below.
+We also had to say goodbye to some features that have been superceded by newer
ones in recent releases,
+most prominently we are **removing the old SQL execution engine**.
+
+We hope you like the new release and we'd be eager to learn about your
experience with it, which yet
+unsolved problems it solves, what new use-cases it unlocks for you.
+
+{% toc %}
+
+# Unified Batch and Stream Processing experience
+
+One of Flink's unique characteristics is how it integrates streaming and batch
processing,
+using common unified APIs, and a runtime that supports multiple execution
paradigms.
+
+As motivated in the introduction, we believe Streaming and Batch always go
hand in hand. This quote from
+a [report on Facebook's streaming
infrastructure](https://research.fb.com/wp-content/uploads/2016/11/realtime_data_processing_at_facebook.pdf)
+echos this sentiment nicely.
+
+> Streaming versus batch processing is not an either/or decision. Originally,
all data warehouse
+> processing at Facebook was batch processing. We began developing Puma and
Swift about five years
+> ago. As we showed in Section [...], using a mix of streaming and batch
processing can speed up
+> long pipelines by hours.
+
+Having both the real-time and the historic computations in the same engine
also ensures consistency
+between semantics and makes results well comparable. Here is an [article by
Alibaba](https://www.ververica.com/blog/apache-flinks-stream-batch-unification-powers-alibabas-11.11-in-2020)
+about unifying business reporting with Apache Flink and getting consistent
reports that way.
+
+While unified streaming & batch are already possible in earlier versions, this
release bring a
+some features that unlock new use cases, as well as a series of
quality-of-life improvements.
+
+## Checkpointing and Bounded Streams
+
+Flink's checkpointing mechanism could originally only draw checkpoints when
all tasks in an application's
+DAG were running. That means that mixed bounded/unbounded applications were
not really possible.
+In addition, applications on bounded inputs that were executed in a streaming
way (not in a batch way)
+stopped checkpointing towards the end, when some tasks finished. Lingering
data for exactly-once
+sinks was the result, because in the absence of checkpoints, the latest output
data was not committed.
+
+With
[FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished)
+Flink now supports checkpoints after tasks are finished, and takes a final
checkpoint at the end of a
+bounded stream, ensuring that all sink data is committed before the job ends
(similar to how
+*stop-with-savepoint* behaves).
+
+To activate this feature, add
`execution.checkpointing.checkpoints-after-tasks-finish.enabled: true`
+to your configuration. Keeping with the tradition of making big new features
opt-in for the first release,
+the feature is not active by default in Flink 1.14. We expect it to become the
default in the next release.
+
+As a side note: Why would we even want to execute applications over bounded
streams in a streaming way,
+rather than in a batch-y way? There can be many reasons, like (a) the sink
that you write to needs to be
+written to in a streaming way (like the Kafka Sink), or (b) that you do want
to exploit the
+streaming-inherent quasi-ordering-by-time, such as in the [Kappa+
Architecture](https://youtu.be/4qSlsYogALo?t=666).
+
+## Mixing Batch DataStream API and Table API/SQL
+
+SQL and the Table API are becoming the default starting points for new
projects. The declarative
+nature and richness of built-in types and operations make it easy to develop
applications fast.
+It is not uncommon, though, for developers to eventually hit the limit of
SQL's expressiveness for
+certain types of event-driven business logic (or hit the point when it becomes
grotesque to express
+that logic in SQL).
+
+At that point, the natural step is to blend in a piece of stateful DataStream
API logic, before
+switching back to SQL again.
+
+In Flink 1.14, bounded batch-executed SQL/Table programs can convert their
intermediate
+Tables to a DataStream, apply some DataSteam API operations, and convert it
back to a Table. Flink builds
+a dataflow DAG mixing declarative optimized SQL execution with batch-executed
DataStream logic.
+Check out the [relevant
docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/data_stream_api/#converting-between-datastream-and-table)
for details.
+
+## Hybrid Source
+
+The new [Hybrid
Source](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/)
+produces a combined stream from other sources, by reading those other sources
one after the other,
+seamlessly switching over from one to the other.
+
+With the Hybrid Source, you can read for example from tiered storage setups as
if there was one
+stream that spans everything. Consider the example below: New data lands in
Kafa and is eventually
+migrated to S3 (typically in compressed columnar format, for cost efficiency
and performance).
+The Hybrid Source can read a stream that starts all the way in the past on S3
and transitions over
+to the real-time data in Kafka.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/hybrid_source.png" style="display: block;
margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+We believe that this is an exciting step in realizing the full promise of logs
and the *Kappa Architecture.*
+Even if older parts of a log of events are physically migrated to different
storage
+(because cheaper, better compression, faster to read) you can still treat and
process it as one
+contiguous log.
+
+The Hybrid Source foundation is in Flink 1.14. Over the next releases, we
expect to add more
+utilities and patterns for typical switching strategies.
+
+## Consolidating Sources and Sink
+
+With the new unified (streaming/batch) source and sink APIs becoming stable,
we started the
+big effort to consolidate all connectors around those APIs. At the same time,
are
+better aligning connectors between DataStream and SQL/Table API. First are the
*Kafka* and
+*File* Soures and Sinks for the DataStream API.
+
+The result of this effort (that we expect to span at least 1-2 futher
releases) will be a much
+more smooth and consistent experience for Flink users when connecting to
external systems;
+something were we need to acknowledge that there is room for improvement in
Flink.
+
+# Improvements to Operations
+
+## Buffer debloating
+
+*Buffer Debloating* is a new technology in Flink that automatically tunes the
usage
+of network memory to ensure throughput, while minimizing in-flight data and
thus minimizing
+checkpoint latency and cost.
+
+Apache Flink buffers a certain amount of data in its network stack to be able
to utilize the
+bandwidth of fast networks. A Flink application running under high trhoughput
uses some of
+all of that memory. Aligned checkpoints flow with the data through the network
buffers in milliseconds.
+
+When a Flink application becomes (temporarily) backpressured (for example when
being backpressured
+by an external system, or when hitting skewed records), there is typically now
a lot more data in
+the network buffers than is necessary for the network to support the
application's current throughput
+(which is lowered due to backpressure). There is even an adverse effect: more
buffered data means
+the checkpoints need to do more work. Aligned checkpoint barriers need to wait
for more data to be
+processed, unaligned checkpoints need to persist more in-flight data.
+
+This observation that under backpressure there is more data in the network
buffers than than necessary is
+where *Buffer Debloating* starts: It changes the network stack from keeping up
to X bytes of data
+to keeping data that is worth X milliseconds of receiver computing time. With
the default setting
+of 1000 milliseconds, that means the network stack will buffer as much data as
the receiving task can
+process in 1000 milliseconds. This value is constantly measured and adjusted,
so the system keeps
+this characteristic under even under varying conditions. As a result, Flink
can now provide
+stable and predictable alignment times for aligned checkpoints under
backpressure, and can vastly
+reduce the amount of in-flight data stored in unaliged checkpoints under
backpressure.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/buffer_debloating.png" style="display:
block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+Buffer Deloating acts as a complementary feature, or even alternative, to
unaligned checkpoints.
+Checkout [the
docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/memory/network_mem_tuning/#the-buffer-debloating-mechanism)
+to see how to activate this feature.
+
+
+## Fine-grained Resource Management
+
+The new *Fine-grained Resource Management* is an advanced feature to increase
the resource
+utilization of large shared clusters.
+
+Flink clusters execute various data processing workloads, and different data
processing steps typically need
+different resources: compute resource, memory, etc. For example a filter in
SQL needs different
+resources than a join. By default, Flink manages resources in coarse-grained
+units called 'slots', which are slices of a TaskManager's resources. A slot is
then filled with tasks
+that use those resources. Through *'slot sharing groups'*, users can influence
how the tasks are put
+into slots.
+
+With fine-grained resource management, TaskManager slots now dynamically
sized. Transformations and Operators
Review comment:
```suggestion
With fine-grained resource management, TaskManager slots are now dynamically
sized. Transformations and Operators
```
##########
File path: _posts/2021-09-21-release-1.14.0.md
##########
@@ -0,0 +1,274 @@
+---
+layout: post
+title: "Apache Flink 1.14.0 Release Announcement"
+date: 2021-09-21T08:00:00.000Z
+categories: news
+authors:
+- joemoe:
+ name: "Johannes Moser"
+
+excerpt: The Apache Flink community is excited to announce the release of
Flink 1.14.0! Around xxx contributors worked on over xxxx issues to TODO.
+---
+
+Just a couple of days ago the Apache Software Foundation announced its annual
report and Apache
+Flink was again in the Top 5 of the most active projects in all relevant
categories. This remarkable
+activity is also reflected in this new 1.14.0 release. Once again, more than
200 contributors worked on
+over 1,000 issues. We are proud of how this community is consistently moving
the project forward.
+
+The release brings many cool improvements, from SQL to connectors,
checkpointing, and PyFlink.
+A big area of changes in this release is the integrated streaming & batch
experience. We believe
+that unbounded stream processing goes hand-in-hand with bounded- and batch
processing tasks in practice.
+Data exploration when developing new applications, bootstrapping state for new
applications, training
+models to be applied in the streaming application, re-processing data after
fixes/upgrades, and many
+other use cases require processing historic data from various sources next to
the streaming data.
+
+In Flink 1.14, we finally made it possible to **mix bounded and unbounded
streams in an application**;
+Flink can now take checkpoints of applications that is partially running and
partially finished (some
+operators reached the end of the bounded inputs). Additionally, **bounded
streams now take a final checkpoint**
+when reaching their end to ensure smooth committing of results in sinks.
+The **batch execution mode now works for programs that mix DataStream &
Table/SQL** (previously only
+pure Table/SQL or DataStream programs). The unified Source and Sink APIs have
made strides ahead,
+we started **consolidating the connector ecosystem around the unified APIs**,
and added a **hybrid source**
+that can bridge between multiple storage systems, like start reading old data
from S3 and switch over
+to Kafka later.
+
+Furthermore, this release takes another step in our initiative of making Flink
more self-tuning and
+to require less Stream-Processor-specific knowledge to operate. We started
that initiative in the previous
+release with [Reactive
Scaling](/news/2021/05/03/release-1.13.0.html#reactive-scaling) and are now
+adding **automatic network memory tuning** (*a.k.a. Buffer Debloating*). This
feature speeds up checkpoints
+under load without sacrificing performance or increasing checkpoint size, by
continuously adjusting the
+network buffering to ensure best throughput while having minimal in-flight
data. See the
+[Buffer Debloating section](#buffer-debloating) for details.
+
+There are many more improvements and new additions throughout various
components, as we discuss below.
+We also had to say goodbye to some features that have been superceded by newer
ones in recent releases,
+most prominently we are **removing the old SQL execution engine**.
+
+We hope you like the new release and we'd be eager to learn about your
experience with it, which yet
+unsolved problems it solves, what new use-cases it unlocks for you.
+
+{% toc %}
+
+# Unified Batch and Stream Processing experience
+
+One of Flink's unique characteristics is how it integrates streaming and batch
processing,
+using common unified APIs, and a runtime that supports multiple execution
paradigms.
+
+As motivated in the introduction, we believe Streaming and Batch always go
hand in hand. This quote from
+a [report on Facebook's streaming
infrastructure](https://research.fb.com/wp-content/uploads/2016/11/realtime_data_processing_at_facebook.pdf)
+echos this sentiment nicely.
+
+> Streaming versus batch processing is not an either/or decision. Originally,
all data warehouse
+> processing at Facebook was batch processing. We began developing Puma and
Swift about five years
+> ago. As we showed in Section [...], using a mix of streaming and batch
processing can speed up
+> long pipelines by hours.
+
+Having both the real-time and the historic computations in the same engine
also ensures consistency
+between semantics and makes results well comparable. Here is an [article by
Alibaba](https://www.ververica.com/blog/apache-flinks-stream-batch-unification-powers-alibabas-11.11-in-2020)
+about unifying business reporting with Apache Flink and getting consistent
reports that way.
+
+While unified streaming & batch are already possible in earlier versions, this
release bring a
+some features that unlock new use cases, as well as a series of
quality-of-life improvements.
+
+## Checkpointing and Bounded Streams
+
+Flink's checkpointing mechanism could originally only draw checkpoints when
all tasks in an application's
+DAG were running. That means that mixed bounded/unbounded applications were
not really possible.
+In addition, applications on bounded inputs that were executed in a streaming
way (not in a batch way)
+stopped checkpointing towards the end, when some tasks finished. Lingering
data for exactly-once
+sinks was the result, because in the absence of checkpoints, the latest output
data was not committed.
+
+With
[FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished)
+Flink now supports checkpoints after tasks are finished, and takes a final
checkpoint at the end of a
+bounded stream, ensuring that all sink data is committed before the job ends
(similar to how
+*stop-with-savepoint* behaves).
+
+To activate this feature, add
`execution.checkpointing.checkpoints-after-tasks-finish.enabled: true`
+to your configuration. Keeping with the tradition of making big new features
opt-in for the first release,
+the feature is not active by default in Flink 1.14. We expect it to become the
default in the next release.
+
+As a side note: Why would we even want to execute applications over bounded
streams in a streaming way,
+rather than in a batch-y way? There can be many reasons, like (a) the sink
that you write to needs to be
+written to in a streaming way (like the Kafka Sink), or (b) that you do want
to exploit the
+streaming-inherent quasi-ordering-by-time, such as in the [Kappa+
Architecture](https://youtu.be/4qSlsYogALo?t=666).
+
+## Mixing Batch DataStream API and Table API/SQL
+
+SQL and the Table API are becoming the default starting points for new
projects. The declarative
+nature and richness of built-in types and operations make it easy to develop
applications fast.
+It is not uncommon, though, for developers to eventually hit the limit of
SQL's expressiveness for
+certain types of event-driven business logic (or hit the point when it becomes
grotesque to express
+that logic in SQL).
+
+At that point, the natural step is to blend in a piece of stateful DataStream
API logic, before
+switching back to SQL again.
+
+In Flink 1.14, bounded batch-executed SQL/Table programs can convert their
intermediate
+Tables to a DataStream, apply some DataSteam API operations, and convert it
back to a Table. Flink builds
+a dataflow DAG mixing declarative optimized SQL execution with batch-executed
DataStream logic.
+Check out the [relevant
docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/data_stream_api/#converting-between-datastream-and-table)
for details.
+
+## Hybrid Source
+
+The new [Hybrid
Source](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/)
+produces a combined stream from other sources, by reading those other sources
one after the other,
+seamlessly switching over from one to the other.
+
+With the Hybrid Source, you can read for example from tiered storage setups as
if there was one
+stream that spans everything. Consider the example below: New data lands in
Kafa and is eventually
+migrated to S3 (typically in compressed columnar format, for cost efficiency
and performance).
+The Hybrid Source can read a stream that starts all the way in the past on S3
and transitions over
+to the real-time data in Kafka.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/hybrid_source.png" style="display: block;
margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+We believe that this is an exciting step in realizing the full promise of logs
and the *Kappa Architecture.*
+Even if older parts of a log of events are physically migrated to different
storage
+(because cheaper, better compression, faster to read) you can still treat and
process it as one
+contiguous log.
+
+The Hybrid Source foundation is in Flink 1.14. Over the next releases, we
expect to add more
+utilities and patterns for typical switching strategies.
+
+## Consolidating Sources and Sink
+
+With the new unified (streaming/batch) source and sink APIs becoming stable,
we started the
+big effort to consolidate all connectors around those APIs. At the same time,
are
+better aligning connectors between DataStream and SQL/Table API. First are the
*Kafka* and
+*File* Soures and Sinks for the DataStream API.
+
+The result of this effort (that we expect to span at least 1-2 futher
releases) will be a much
+more smooth and consistent experience for Flink users when connecting to
external systems;
+something were we need to acknowledge that there is room for improvement in
Flink.
+
+# Improvements to Operations
+
+## Buffer debloating
+
+*Buffer Debloating* is a new technology in Flink that automatically tunes the
usage
+of network memory to ensure throughput, while minimizing in-flight data and
thus minimizing
+checkpoint latency and cost.
+
+Apache Flink buffers a certain amount of data in its network stack to be able
to utilize the
+bandwidth of fast networks. A Flink application running under high trhoughput
uses some of
+all of that memory. Aligned checkpoints flow with the data through the network
buffers in milliseconds.
+
+When a Flink application becomes (temporarily) backpressured (for example when
being backpressured
+by an external system, or when hitting skewed records), there is typically now
a lot more data in
+the network buffers than is necessary for the network to support the
application's current throughput
+(which is lowered due to backpressure). There is even an adverse effect: more
buffered data means
+the checkpoints need to do more work. Aligned checkpoint barriers need to wait
for more data to be
+processed, unaligned checkpoints need to persist more in-flight data.
+
+This observation that under backpressure there is more data in the network
buffers than than necessary is
+where *Buffer Debloating* starts: It changes the network stack from keeping up
to X bytes of data
+to keeping data that is worth X milliseconds of receiver computing time. With
the default setting
+of 1000 milliseconds, that means the network stack will buffer as much data as
the receiving task can
+process in 1000 milliseconds. This value is constantly measured and adjusted,
so the system keeps
+this characteristic under even under varying conditions. As a result, Flink
can now provide
+stable and predictable alignment times for aligned checkpoints under
backpressure, and can vastly
+reduce the amount of in-flight data stored in unaliged checkpoints under
backpressure.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/buffer_debloating.png" style="display:
block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+Buffer Deloating acts as a complementary feature, or even alternative, to
unaligned checkpoints.
+Checkout [the
docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/memory/network_mem_tuning/#the-buffer-debloating-mechanism)
+to see how to activate this feature.
+
+
+## Fine-grained Resource Management
+
+The new *Fine-grained Resource Management* is an advanced feature to increase
the resource
+utilization of large shared clusters.
+
+Flink clusters execute various data processing workloads, and different data
processing steps typically need
+different resources: compute resource, memory, etc. For example a filter in
SQL needs different
+resources than a join. By default, Flink manages resources in coarse-grained
+units called 'slots', which are slices of a TaskManager's resources. A slot is
then filled with tasks
+that use those resources. Through *'slot sharing groups'*, users can influence
how the tasks are put
+into slots.
+
+With fine-grained resource management, TaskManager slots now dynamically
sized. Transformations and Operators
+specify what resource profiles they would like, and Flink's Resource Manager
and TaskManagers slice
+off that specific part of a TaskManagers total resources. Think of it as a
minimal lightweight resource
+orchestration layer within Flink. Hmmmm, why would we even do that, is this
not better handled directly by
+Kubernetes or Yarn? For most cases, that is in fact true. But we have seen
that in large shared clusters,
+resource requirements change so fast (especially in SQL jobs where each
operator needs different resources),
+that a lot of efficiency gets lost waiting for Yarn and K8s to fullfill the
resource requests (taking seconds,
+where Flink internally does it in milliseconds).
+
+This feature is mainly useful for large platform deployments where Flink runs
many batch
+jobs of varying kind and manages the resources actively (dynamically
requesting and releasing resources
+from Kubernetes or Yarn, as opposted to opaque reactive deployments). Please
refer to the
+[Fine-grained Resource Management
Docs](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/finegrained_resource/)
+for details on how to use this feature.
+
+<figure style="align-content: center">
+ <img src="{{ site.baseurl
}}/img/blog/2021-09-25-release-1.14.0/fine_grained_resource_management.png"
style="display: block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+Alibaba's internal SQL platform on top of Flink and has used thich mechanism
for some years now and
Review comment:
```suggestion
Alibaba's internal SQL platform on top of Flink and has used this mechanism
for some years now and
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]