tillrohrmann commented on a change in pull request #427:
URL: https://github.com/apache/flink-web/pull/427#discussion_r612258141
##########
File path: _posts/2021-04-05-reactive-mode.md
##########
@@ -0,0 +1,149 @@
+---
+layout: post
+title: "Scaling Flink automatically with Reactive Mode"
+date: 2021-04-5T00:00:00.000Z
+authors:
+- rob:
+ name: "Robert Metzger"
+ twitter: "rmetzger_"
+excerpt: Apache Flink 1.13 introduced Reactive Mode, a big step forward in
Flink's ability to dynamically adjust to changing workloads, reducing resource
utilization and overall costs. The blog post is demonstrating Reactive Mode on
Kubernetes, including some lessons learned.
+
+---
+
+{% toc %}
+
+## Introduction
+
+Streaming jobs which run for several days or longer usually experience changes
in their workload during their lifetime. These changes can originate from
seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs.
non-holidays, sudden events or simply growing popularity of your product. Some
of these changes are more predictable than others but what all have in common
is that they change the resource demand of your job if you want to keep the
same service quality for your customers.
+
+When comparing a scenario where the number of workers is static, while the
workload is changing over time against the same workload, but with the number
of workers adjusting to the workload. The area between the worker allocation
and the load is the loss, which is generally much higher with a static
allocation.
+<div class="row front-graphic">
+ <img src="{{ site.baseurl }}/img/blog/2021-04-reactive-mode/intro.png"
width="640px" alt="Reactive Mode Intro"/>
+</div>
+
+
+Since Flink 1.2 introduced [rescalable
state](https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html),
it has been possible to **manually rescale**. You can stop-and-restore a job
with a different parallelism. If your job was running with a parallelism of
p=100, and your load increased, you can restart it with p=200 to cope with the
additional data.
+
+The problem with this approach is that you have to orchestrate a rescale
operation with custom tools, including error handling etc. by yourself.
+
+[Reactive
Mode](https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/elastic_scaling/)
introduces a new option in Flink 1.13: You monitor your Flink cluster and add
or remove resources depending on some metrics, Flink will do the rest. Reactive
Mode is a JobManager mode where your job is always using all TaskManagers
available.
+
+The big benefit of Reactive Mode is that you don't need any specific knowledge
to scale Flink anymore. Flink basically behaves like a fleet of servers
(webservers, caches, batch processing) that you can grow and shrink as you
wish. Since this is such a common pattern, there is a lot of infrastructure
available for handling this case. All major cloud providers provide facilities
to monitor a metric and automatically scale a set of machines accordingly. This
is for example called [Auto Scaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
in AWS, or [Managed Instance
group](https://cloud.google.com/compute/docs/instance-groups) in Google Cloud.
+Similarly, Kubernetes has [Horizontal Pod
Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)s.
+
+What's interesting as a side note is that unlike most autoscalable "fleets of
servers", Flink is a stateful system, often processing valuable data requiring
strong correctness guarantees, comparable to a database. But unlike a database,
Flink is resilient enough (through checkpointing and state backups) to adjust
to changing workloads by just adding or removing resources, with very little
requirements (simple blob store for state backups).
+
+## Getting Started
+
+If you want to try out Reactive Mode yourself locally, just follow these steps
on a Flink 1.13.0 distribution:
+
+```bash
+# these instructions assume you are in the root directory of a Flink
distribution.
+# Put Job into lib/ directory
+cp ./examples/streaming/TopSpeedWindowing.jar lib/
Review comment:
You should still be able to create and use it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]