[GitHub] [flink-web] tillrohrmann commented on a change in pull request #427: Add reactive mode blog post

GitBox Tue, 13 Apr 2021 01:52:35 -0700


tillrohrmann commented on a change in pull request #427:
URL: https://github.com/apache/flink-web/pull/427#discussion_r612258141




##########
File path: _posts/2021-04-05-reactive-mode.md
##########
@@ -0,0 +1,149 @@
+---
+layout: post
+title:  "Scaling Flink automatically with Reactive Mode"
+date: 2021-04-5T00:00:00.000Z
+authors:
+- rob:
+  name: "Robert Metzger"
+  twitter: "rmetzger_"
+excerpt: Apache Flink 1.13 introduced Reactive Mode, a big step forward in 
Flink's ability to dynamically adjust to changing workloads, reducing resource 
utilization and overall costs. The blog post is demonstrating Reactive Mode on 
Kubernetes, including some lessons learned.
+
+---
+
+{% toc %}
+
+## Introduction
+
+Streaming jobs which run for several days or longer usually experience changes 
in their workload during their lifetime. These changes can originate from 
seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. 
non-holidays, sudden events or simply growing popularity of your product. Some 
of these changes are more predictable than others but what all have in common 
is that they change the resource demand of your job if you want to keep the 
same service quality for your customers.
+
+When comparing a scenario where the number of workers is static, while the 
workload is changing over time against the same workload, but with the number 
of workers adjusting to the workload. The area between the worker allocation 
and the load is the loss, which is generally much higher with a static 
allocation. 
+<div class="row front-graphic">
+  <img src="{{ site.baseurl }}/img/blog/2021-04-reactive-mode/intro.png" 
width="640px" alt="Reactive Mode Intro"/>
+</div>
+
+
+Since Flink 1.2 introduced [rescalable 
state](https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html),
 it has been possible to **manually rescale**. You can stop-and-restore a job 
with a different parallelism. If your job was running with a parallelism of 
p=100, and your load increased, you can restart it with p=200 to cope with the 
additional data. 
+
+The problem with this approach is that you have to orchestrate a rescale 
operation with custom tools, including error handling etc. by yourself.
+
+[Reactive 
Mode](https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/elastic_scaling/)
 introduces a new option in Flink 1.13: You monitor your Flink cluster and add 
or remove resources depending on some metrics, Flink will do the rest. Reactive 
Mode is a JobManager mode where your job is always using all TaskManagers 
available.
+
+The big benefit of Reactive Mode is that you don't need any specific knowledge 
to scale Flink anymore. Flink basically behaves like a fleet of servers 
(webservers, caches, batch processing) that you can grow and shrink as you 
wish. Since this is such a common pattern, there is a lot of infrastructure 
available for handling this case. All major cloud providers provide facilities 
to monitor a metric and automatically scale a set of machines accordingly. This 
is for example called [Auto Scaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 in AWS, or [Managed Instance 
group](https://cloud.google.com/compute/docs/instance-groups) in Google Cloud.
+Similarly, Kubernetes has [Horizontal Pod 
Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)s.
+
+What's interesting as a side note is that unlike most autoscalable "fleets of 
servers", Flink is a stateful system, often processing valuable data requiring 
strong correctness guarantees, comparable to a database. But unlike a database, 
Flink is resilient enough (through checkpointing and state backups) to adjust 
to changing workloads by just adding or removing resources, with very little 
requirements (simple blob store for state backups).
+
+## Getting Started
+
+If you want to try out Reactive Mode yourself locally, just follow these steps 
on a Flink 1.13.0 distribution:
+
+```bash
+# these instructions assume you are in the root directory of a Flink 
distribution.
+# Put Job into lib/ directory
+cp ./examples/streaming/TopSpeedWindowing.jar lib/

Review comment:
       You should still be able to create and use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] tillrohrmann commented on a change in pull request #427: Add reactive mode blog post

Reply via email to