[GitHub] [flink-web] twalthr commented on a change in pull request #466: Add Apache Flink release 1.14.0

GitBox Fri, 10 Sep 2021 01:35:30 -0700


twalthr commented on a change in pull request #466:
URL: https://github.com/apache/flink-web/pull/466#discussion_r706001267




##########
File path: _posts/2021-09-21-release-1.14.0.md
##########
@@ -0,0 +1,182 @@
+---
+layout: post 
+title:  "Apache Flink 1.14.0 Release Announcement"
+date: 2021-09-21T08:00:00.000Z 
+categories: news 
+authors:
+- joemoe:
+  name: "Johannes Moser"
+
+excerpt: The Apache Flink community is excited to announce the release of 
Flink 1.14.0! Around xxx contributors worked on over xxxx issues to TODO.
+---
+
+Just a couple of days ago the Apache Software Foundation announced it’s annual 
report and Apache
+Flink being in the Top 5 in all relevant categories is just an outcome of the 
work of the
+community (that has been done yet again) for 1.14.0. The consistency how this 
project is moving
+forward is remarkable. Once again 200 plus contributors worked on over 1,000 
issues.
+
+Apache Flink not only supports batch and stream processing, but has been 
always following the goal
+of making it a unified experience. With Apache Flink 1.14.0 batch and stream 
processing moved closer
+together. The first sinks and sources are now providing a unified API 
(following FLIP-27 and
+FLIP-143). Hybrid source has been introduced. The DataStream batch mode has 
been pushed to the
+TableAPI. Under the hood checkpoints are allowed even after tasks are finished 
truly enabling mixed
+or bounded jobs. Existing features have been haromised throughout all 
available APIs. From
+DataStream to TableAPI and SQL and vice versa. The DataStream batch mode is 
maturing after its
+initial release in 1.13.0.
+
+Fault tolerance is part of Flink’s nature, still it can never be enough. By 
using the new option of
+debloating the buffers the checkpoint size can decrease significantly and 
reduce checkpointing times
+to a minimum.
+
+That’s not all there is a huge list of improvements and new additions through 
out all components.
+Also we had to say goodbye to some features that have been superseded in 
recent releases. We hope
+you like the new release and we’d be eager to learn about your experience with 
it, which yet
+unsolved problems it solves, what new use-cases it unlocks for you.
+
+{% toc %}
+
+# Notable improvements for a Unified Batch and Stream Processing experience
+
+Apache Flink unlocks both batch and stream processing use cases. There is a 
lot of traction for
+both. Initially both features were rather separated, that’s why the API were 
not really aligned in
+the first place and also moved into apart over time. With both APIs being 
stable users started to
+combine them in their solutions. Having batch style workloads to initially 
process historic data and
+then switching into the streaming mode to deal with live data is something 
that makes sense. But
+having the two APIs separated not only lead to the mentioned differences but 
also to big white spots
+on the sparsity matrices and it became quite confusing what worked with which 
API and what not.
+About a year ago the community started to unify the experience by seeing batch 
as a special case of
+streaming. The notion of bounded and unbounded streams has been introduced and 
initially released in
+the most recent Apache Flink release 1.13. Now this effort has been continued 
as Apache Flink not
+only wants to unlock use cases bot also making it a good user experience. The 
list below
+demonstrates the impact of this change. It is not only about the now 
deprecated DataSet API and the
+DataStream API, it also affects sources and sinks, checkpointing, the Table 
API, Flink SQL,…
+
+## Unified Source and Sink APIs
+
+Sources and sinks play a big role to unlock both streaming and batch use 
cases. There’s quite a list
+of sources and sinks currently supported included in Apache Flink. There are 
also some external
+packages available. It might be hard to find two that support the same set of 
features. For sure
+this applies to supporting bounded and unbounded streams, but there are also 
differences on what is
+exposed in the Table API and SQL and what kind of checkpointing is supported. 
That’s why the
+community came up with FLINK-27 and FLIP-143. With Apache Flink 1.14. it is 
the first time they have
+been truly implemented the FLIPs for Kafka source and sink.
+
+The changes in the sink circle mostly around committing behaviour to enable 
all delivery guarantees
+to provide solid fault tolerance.
+
+The Kafka source has already been in good shape. This is now also exposed in 
the Table API and SQL.
+
+## Hybrid Sources
+
+User are facing the problem of having more and more sources for data which 
requires them to unify
+the data in the first place. Till now the only way to achieve some of the use 
cases was to have two
+parallel Flink jobs or to implement that in a hacky way. This is not the user 
experience Apache
+Flink wants to provide. With Apache Flink 1.14 hybrid sources was introduced 
to provide a coherent
+experience in unifying heterogenous data feeds into one homogenous data 
stream. So you might have a
+file source to load historic data and then switch over to a Kafka source to 
cover the streaming
+data.
+
+## Aligning DataStream API, Table API and Flink SQL
+
+With the DataSet API being deprecated the future of Flink will circle around 
the DataStream API.

Review comment:
       On the ML we actually decided to have a dedicated blog post again, but 
this didn't happen yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] twalthr commented on a change in pull request #466: Add Apache Flink release 1.14.0

Reply via email to