[GitHub] [flink-web] infoverload commented on a change in pull request #468: Release blog post 1.14 draft iteration

GitBox Mon, 27 Sep 2021 09:21:26 -0700


infoverload commented on a change in pull request #468:
URL: https://github.com/apache/flink-web/pull/468#discussion_r716849772




##########
File path: _posts/2021-09-21-release-1.14.0.md
##########
@@ -0,0 +1,274 @@
+---
+layout: post 
+title:  "Apache Flink 1.14.0 Release Announcement"
+date: 2021-09-21T08:00:00.000Z 
+categories: news 
+authors:
+- joemoe:
+  name: "Johannes Moser"
+
+excerpt: The Apache Flink community is excited to announce the release of 
Flink 1.14.0! Around xxx contributors worked on over xxxx issues to TODO.
+---
+
+Just a couple of days ago the Apache Software Foundation announced its annual 
report and Apache
+Flink was again in the Top 5 of the most active projects in all relevant 
categories. This remarkable
+activity is also reflected in this new 1.14.0 release. Once again, more than 
200 contributors worked on
+over 1,000 issues. We are proud of how this community is consistently moving 
the project forward.
+
+The release brings many cool improvements, from SQL to connectors, 
checkpointing, and PyFlink.
+A big area of changes in this release is the integrated streaming & batch 
experience. We believe
+that unbounded stream processing goes hand-in-hand with bounded- and batch 
processing tasks in practice.
+Data exploration when developing new applications, bootstrapping state for new 
applications, training
+models to be applied in the streaming application, re-processing data after 
fixes/upgrades, and many
+other use cases require processing historic data from various sources next to 
the streaming data.
+
+In Flink 1.14, we finally made it possible to **mix bounded and unbounded 
streams in an application**;
+Flink can now take checkpoints of applications that is partially running and 
partially finished (some
+operators reached the end of the bounded inputs). Additionally, **bounded 
streams now take a final checkpoint**
+when reaching their end to ensure smooth committing of results in sinks.
+The **batch execution mode now works for programs that mix DataStream & 
Table/SQL** (previously only
+pure Table/SQL or DataStream programs). The unified Source and Sink APIs have 
made strides ahead,
+we started **consolidating the connector ecosystem around the unified APIs**, 
and added a **hybrid source**
+that can bridge between multiple storage systems, like start reading old data 
from S3 and switch over
+to Kafka later.
+
+Furthermore, this release takes another step in our initiative of making Flink 
more self-tuning and
+to require less Stream-Processor-specific knowledge to operate. We started 
that initiative in the previous
+release with [Reactive 
Scaling](/news/2021/05/03/release-1.13.0.html#reactive-scaling) and are now
+adding **automatic network memory tuning** (*a.k.a. Buffer Debloating*). This 
feature speeds up checkpoints
+under load without sacrificing performance or increasing checkpoint size, by 
continuously adjusting the
+network buffering to ensure best throughput while having minimal in-flight 
data. See the
+[Buffer Debloating section](#buffer-debloating) for details.
+
+There are many more improvements and new additions throughout various 
components, as we discuss below.
+We also had to say goodbye to some features that have been superceded by newer 
ones in recent releases,
+most prominently we are **removing the old SQL execution engine**.
+
+We hope you like the new release and we'd be eager to learn about your 
experience with it, which yet
+unsolved problems it solves, what new use-cases it unlocks for you.
+
+{% toc %}
+
+# Unified Batch and Stream Processing experience
+
+One of Flink's unique characteristics is how it integrates streaming and batch 
processing,
+using common unified APIs, and a runtime that supports multiple execution 
paradigms.
+
+As motivated in the introduction, we believe Streaming and Batch always go 
hand in hand. This quote from
+a [report on Facebook's streaming 
infrastructure](https://research.fb.com/wp-content/uploads/2016/11/realtime_data_processing_at_facebook.pdf)
+echos this sentiment nicely.
+
+> Streaming versus batch processing is not an either/or decision. Originally, 
all data warehouse
+> processing at Facebook was batch processing. We began developing Puma and 
Swift about five years
+> ago. As we showed in Section [...], using a mix of streaming and batch 
processing can speed up
+> long pipelines by hours.
+
+Having both the real-time and the historic computations in the same engine 
also ensures consistency
+between semantics and makes results well comparable. Here is an [article by 
Alibaba](https://www.ververica.com/blog/apache-flinks-stream-batch-unification-powers-alibabas-11.11-in-2020)
+about unifying business reporting with Apache Flink and getting consistent 
reports that way.
+
+While unified streaming & batch are already possible in earlier versions, this 
release bring a
+some features that unlock new use cases, as well as a series of 
quality-of-life improvements.
+
+## Checkpointing and Bounded Streams
+
+Flink's checkpointing mechanism could originally only draw checkpoints when 
all tasks in an application's
+DAG were running. That means that mixed bounded/unbounded applications were 
not really possible.
+In addition, applications on bounded inputs that were executed in a streaming 
way (not in a batch way)
+stopped checkpointing towards the end, when some tasks finished. Lingering 
data for exactly-once
+sinks was the result, because in the absence of checkpoints, the latest output 
data was not committed.
+
+With 
[FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished)
+Flink now supports checkpoints after tasks are finished, and takes a final 
checkpoint at the end of a
+bounded stream, ensuring that all sink data is committed before the job ends 
(similar to how
+*stop-with-savepoint* behaves).
+
+To activate this feature, add 
`execution.checkpointing.checkpoints-after-tasks-finish.enabled: true`
+to your configuration. Keeping with the tradition of making big new features 
opt-in for the first release,
+the feature is not active by default in Flink 1.14. We expect it to become the 
default in the next release.
+
+As a side note: Why would we even want to execute applications over bounded 
streams in a streaming way,
+rather than in a batch-y way? There can be many reasons, like (a) the sink 
that you write to needs to be
+written to in a streaming way (like the Kafka Sink), or (b) that you do want 
to exploit the
+streaming-inherent quasi-ordering-by-time, such as in the [Kappa+ 
Architecture](https://youtu.be/4qSlsYogALo?t=666).

Review comment:
       ```suggestion
   Flink's checkpointing mechanism could previously only create checkpoints 
when all tasks in an application's
   directed acyclic graph (DAG) were running. This meant that applications 
using both bounded and unbounded data processing were not really possible. In 
addition, applications with bounded inputs that were executed in an unbounded 
manner stopped checkpointing towards the end of processing, which can result in 
only some tasks finishing. Without checkpoints, the latest outputs were not 
committed, resulting in lingering data for exactly-once
   sinks.
   
   With 
[FLIP-147](https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished),
   Flink now supports checkpoints after tasks are finished and takes a final 
checkpoint at the end of a
   bounded stream, ensuring that all output data is committed before the job 
ends (similar to how
   *stop-with-savepoint* behaves).
   
   To activate this feature, add 
`execution.checkpointing.checkpoints-after-tasks-finish.enabled: true`
   to your configuration. Keeping with the opt-in tradition for big and new 
features,
   this is not active by default in Flink 1.14. We expect it to become the 
default mode in the next release.
   
   Note: You may want to execute applications over bounded streams in an 
unbounded way for many reasons. The sink being used may need to be written to 
in a streaming manner (i.e. Kafka sink) or you may not want to exploit the 
streaming-inherent quasi-ordering-by-time, such as in the [Kappa+ 
Architecture](https://youtu.be/4qSlsYogALo?t=666).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] infoverload commented on a change in pull request #468: Release blog post 1.14 draft iteration

Reply via email to