zentol commented on code in PR #23570:
URL: https://github.com/apache/flink/pull/23570#discussion_r1368477395
##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
# Elastic Scaling
-Apache Flink allows you to rescale your jobs. You can do this manually by
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either
not-catching up with inputs or over-provisioned. You can do this by manually
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
-This page describes options where Flink automatically adjusts the parallelism
instead.
+This page describes options **where Flink automatically adjusts the
parallelism** instead, without relying on external orchestration.
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available
slots. It will automatically reduce the parallelism if not enough slots are
available to run the job with the originally configured parallelism; be it due
to not enough resources being available at the time of submission, or
TaskManager outages during the job execution. If new slots become available the
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated
as if it was set to infinity, letting the job always use as many resources as
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it
can handle TaskManager losses gracefully, since it would just scale down in
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
As you can see, instead of asking for the exact number of slots, JobMaster
declares its desired resources (for reactive mode the maximum is set to
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically
rescale the job using the latest available savepoint, eliminating the need for
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job
starts and cannot be changed during the runtime. This means that the Adaptive
Scheduler cannot handle cases where the job needs to be rescaled due to a
change in the input rate, or a change in the performance of the workload.
Starting from **Flink 1.18.x**, this is addressed by [External Declarative
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable
product") feature. The Flink community is actively looking for feedback by
users through our mailing lists. Please check the limitations listed on this
page.
+{{< /hint >}}
{{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink
community is actively looking for feedback by users through our mailing lists.
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache
Flink Kubernetes
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
for a fully-fledged auto-scaling experience.
{{< /hint >}}
-Reactive Mode configures a job so that it always uses all resources available
in the cluster. Adding a TaskManager will scale up your job, removing resources
will scale it down. Flink will manage the parallelism of the job, always
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for
resources, and you need a finer-grained control over the distribution of
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active
Resource Manager (e.g. [Native Kubernetes]({{< ref
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on
Flink to "greedily" spawn new TaskManagers, but you still want to leverage
rescaling capabilities as with [Reactive Mode](#reactive-mode).
-Reactive Mode restarts a job on a rescaling event, restoring it from the
latest completed checkpoint. This means that there is no overhead of creating a
savepoint (which is needed for manually rescaling a job). Also, the amount of
data that is reprocessed after rescaling depends on the checkpointing interval,
and the restore time depends on the state size.
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api"
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource
requirements of a running job, by setting per-vertex parallelism boundaries.
-The Reactive Mode allows Flink users to implement a powerful autoscaling
mechanism, by having an external service monitor certain metrics, such as
consumer lag, aggregate CPU utilization, throughput or latency. As soon as
these metrics are above or below a certain threshold, additional TaskManagers
can be added or removed from the Flink cluster. This could be implemented
through changing the [replica
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
of a Kubernetes deployment, or an [autoscaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
on AWS. This external service only needs to handle the resource allocation and
deallocation. Flink will take care of keeping the job running with the
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
+REQUEST BODY:
+{
+ "<first-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 3,
+ "upperBound": 5
+ }
+ },
+ "<second-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 2,
+ "upperBound": 3
+ }
+ }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a
"re-scaling endpoint" and it introduces an important building block for
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref
"docs/deployment/overview" >}}/#session-mode), there are no guarantees
regarding the distribution of slots between multiple running jobs in the same
session, in case cluster doesn't have enough resources. The [External
Declarative Resource Management](#externalized-declarative-resource-management)
can partially mitigate this issue, but it is still recommended to use Adaptive
Scheduler on a [application cluster]({{< ref "docs/deployment/overview"
>}}/#application-mode).
+{{< /hint >}}
+
+The `jobmanager.scheduler` needs to be set to on the cluster level for the
adaptive scheduler to be used instead of default scheduler.
+
+```yaml
+jobmanager.scheduler: adaptive
+```
+
+The behavior of Adaptive Scheduler is configured by [all configuration options
prefixed with `jobmanager.adaptive-scheduler`]({{< ref
"docs/deployment/config">}}#advanced-scheduling-options) in their name.
+
+### Limitations
+
+- **Streaming jobs only**: The Adaptive Scheduler runs with streaming jobs
only. When submitting a batch job, Flink will use the default scheduler of
batch jobs, i.e. [Adaptive Batch Scheduler](#adaptive-batch-scheduler)
+- **No support for partial failover**: Partial failover means that the
scheduler is able to restart parts ("regions" in Flink's internals) of a failed
job, instead of the entire job. This limitation impacts only recovery time of
embarrassingly parallel jobs: Flink's default scheduler can restart failed
parts, while Adaptive Scheduler will restart the entire job.
+- Scaling events trigger job and task restarts, which will increase the number
of Task attempts.
+
+## Reactive Mode
+
+Reactive Mode is a special mode for Adaptive Scheduler, that assumes a single
job per-cluster (enforced by the [Application Mode]({{< ref
"docs/deployment/overview" >}}#application-mode)). Reactive Mode configures a
job so that it always uses all resources available in the cluster. Adding a
TaskManager will scale up your job, removing resources will scale it down.
Flink will manage the parallelism of the job, always setting it to the highest
possible values.
+
+Reactive Mode restarts a job on a rescaling event, restoring it from the
latest completed checkpoint. This means that there is no overhead of creating a
savepoint (which is needed for manually rescaling a job). Also, the amount of
data that is reprocessed after rescaling depends on the checkpointing interval,
and the restore time depends on the state size.
+
+The Reactive Mode allows Flink users to implement a powerful autoscaling
mechanism, by having an external service monitor certain metrics, such as
consumer lag, aggregate CPU utilization, throughput or latency. As soon as
these metrics are above or below a certain threshold, additional TaskManagers
can be added or removed from the Flink cluster. This could be implemented
through changing the [replica
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
of a Kubernetes deployment, or an [autoscaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
on AWS. This external service only needs to handle the resource allocation and
deallocation. Flink will take care of keeping the job running with the
resources available.
+
+localhost:1313/flink/flink-docs-master
Review Comment:
🙈
##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
# Elastic Scaling
-Apache Flink allows you to rescale your jobs. You can do this manually by
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either
not-catching up with inputs or over-provisioned. You can do this by manually
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
-This page describes options where Flink automatically adjusts the parallelism
instead.
+This page describes options **where Flink automatically adjusts the
parallelism** instead, without relying on external orchestration.
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available
slots. It will automatically reduce the parallelism if not enough slots are
available to run the job with the originally configured parallelism; be it due
to not enough resources being available at the time of submission, or
TaskManager outages during the job execution. If new slots become available the
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated
as if it was set to infinity, letting the job always use as many resources as
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it
can handle TaskManager losses gracefully, since it would just scale down in
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
As you can see, instead of asking for the exact number of slots, JobMaster
declares its desired resources (for reactive mode the maximum is set to
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically
rescale the job using the latest available savepoint, eliminating the need for
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job
starts and cannot be changed during the runtime. This means that the Adaptive
Scheduler cannot handle cases where the job needs to be rescaled due to a
change in the input rate, or a change in the performance of the workload.
Starting from **Flink 1.18.x**, this is addressed by [External Declarative
Resource Management](#externalized-declarative-resource-management).
Review Comment:
We can phrase this more positively. "Since 1.18.x you are also able to
change the target parallelism via [EDRM..., effectively allowing you to rescale
jobs at runtime."
##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
# Elastic Scaling
-Apache Flink allows you to rescale your jobs. You can do this manually by
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either
not-catching up with inputs or over-provisioned. You can do this by manually
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
-This page describes options where Flink automatically adjusts the parallelism
instead.
+This page describes options **where Flink automatically adjusts the
parallelism** instead, without relying on external orchestration.
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available
slots. It will automatically reduce the parallelism if not enough slots are
available to run the job with the originally configured parallelism; be it due
to not enough resources being available at the time of submission, or
TaskManager outages during the job execution. If new slots become available the
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated
as if it was set to infinity, letting the job always use as many resources as
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it
can handle TaskManager losses gracefully, since it would just scale down in
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
As you can see, instead of asking for the exact number of slots, JobMaster
declares its desired resources (for reactive mode the maximum is set to
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically
rescale the job using the latest available savepoint, eliminating the need for
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job
starts and cannot be changed during the runtime. This means that the Adaptive
Scheduler cannot handle cases where the job needs to be rescaled due to a
change in the input rate, or a change in the performance of the workload.
Starting from **Flink 1.18.x**, this is addressed by [External Declarative
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable
product") feature. The Flink community is actively looking for feedback by
users through our mailing lists. Please check the limitations listed on this
page.
+{{< /hint >}}
{{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink
community is actively looking for feedback by users through our mailing lists.
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache
Flink Kubernetes
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
for a fully-fledged auto-scaling experience.
{{< /hint >}}
-Reactive Mode configures a job so that it always uses all resources available
in the cluster. Adding a TaskManager will scale up your job, removing resources
will scale it down. Flink will manage the parallelism of the job, always
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for
resources, and you need a finer-grained control over the distribution of
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active
Resource Manager (e.g. [Native Kubernetes]({{< ref
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on
Flink to "greedily" spawn new TaskManagers, but you still want to leverage
rescaling capabilities as with [Reactive Mode](#reactive-mode).
-Reactive Mode restarts a job on a rescaling event, restoring it from the
latest completed checkpoint. This means that there is no overhead of creating a
savepoint (which is needed for manually rescaling a job). Also, the amount of
data that is reprocessed after rescaling depends on the checkpointing interval,
and the restore time depends on the state size.
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api"
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource
requirements of a running job, by setting per-vertex parallelism boundaries.
-The Reactive Mode allows Flink users to implement a powerful autoscaling
mechanism, by having an external service monitor certain metrics, such as
consumer lag, aggregate CPU utilization, throughput or latency. As soon as
these metrics are above or below a certain threshold, additional TaskManagers
can be added or removed from the Flink cluster. This could be implemented
through changing the [replica
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
of a Kubernetes deployment, or an [autoscaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
on AWS. This external service only needs to handle the resource allocation and
deallocation. Flink will take care of keeping the job running with the
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
+REQUEST BODY:
+{
+ "<first-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 3,
+ "upperBound": 5
+ }
+ },
+ "<second-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 2,
+ "upperBound": 3
+ }
+ }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a
"re-scaling endpoint" and it introduces an important building block for
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref
"docs/deployment/overview" >}}/#session-mode), there are no guarantees
regarding the distribution of slots between multiple running jobs in the same
session, in case cluster doesn't have enough resources. The [External
Declarative Resource Management](#externalized-declarative-resource-management)
can partially mitigate this issue, but it is still recommended to use Adaptive
Scheduler on a [application cluster]({{< ref "docs/deployment/overview"
>}}/#application-mode).
Review Comment:
```suggestion
If you are using Adaptive Scheduler on a [session cluster]({{< ref
"docs/deployment/overview" >}}/#session-mode), there are no guarantees
regarding the distribution of slots between multiple running jobs in the same
session, in case the cluster doesn't have enough resources. The [External
Declarative Resource Management](#externalized-declarative-resource-management)
can partially mitigate this issue, but it is still recommended to use Adaptive
Scheduler on a [application cluster]({{< ref "docs/deployment/overview"
>}}/#application-mode).
```
##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
# Elastic Scaling
-Apache Flink allows you to rescale your jobs. You can do this manually by
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either
not-catching up with inputs or over-provisioned. You can do this by manually
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
-This page describes options where Flink automatically adjusts the parallelism
instead.
+This page describes options **where Flink automatically adjusts the
parallelism** instead, without relying on external orchestration.
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available
slots. It will automatically reduce the parallelism if not enough slots are
available to run the job with the originally configured parallelism; be it due
to not enough resources being available at the time of submission, or
TaskManager outages during the job execution. If new slots become available the
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated
as if it was set to infinity, letting the job always use as many resources as
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it
can handle TaskManager losses gracefully, since it would just scale down in
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
As you can see, instead of asking for the exact number of slots, JobMaster
declares its desired resources (for reactive mode the maximum is set to
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically
rescale the job using the latest available savepoint, eliminating the need for
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job
starts and cannot be changed during the runtime. This means that the Adaptive
Scheduler cannot handle cases where the job needs to be rescaled due to a
change in the input rate, or a change in the performance of the workload.
Starting from **Flink 1.18.x**, this is addressed by [External Declarative
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable
product") feature. The Flink community is actively looking for feedback by
users through our mailing lists. Please check the limitations listed on this
page.
+{{< /hint >}}
{{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink
community is actively looking for feedback by users through our mailing lists.
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache
Flink Kubernetes
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
for a fully-fledged auto-scaling experience.
{{< /hint >}}
-Reactive Mode configures a job so that it always uses all resources available
in the cluster. Adding a TaskManager will scale up your job, removing resources
will scale it down. Flink will manage the parallelism of the job, always
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for
resources, and you need a finer-grained control over the distribution of
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active
Resource Manager (e.g. [Native Kubernetes]({{< ref
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on
Flink to "greedily" spawn new TaskManagers, but you still want to leverage
rescaling capabilities as with [Reactive Mode](#reactive-mode).
-Reactive Mode restarts a job on a rescaling event, restoring it from the
latest completed checkpoint. This means that there is no overhead of creating a
savepoint (which is needed for manually rescaling a job). Also, the amount of
data that is reprocessed after rescaling depends on the checkpointing interval,
and the restore time depends on the state size.
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api"
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource
requirements of a running job, by setting per-vertex parallelism boundaries.
-The Reactive Mode allows Flink users to implement a powerful autoscaling
mechanism, by having an external service monitor certain metrics, such as
consumer lag, aggregate CPU utilization, throughput or latency. As soon as
these metrics are above or below a certain threshold, additional TaskManagers
can be added or removed from the Flink cluster. This could be implemented
through changing the [replica
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
of a Kubernetes deployment, or an [autoscaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
on AWS. This external service only needs to handle the resource allocation and
deallocation. Flink will take care of keeping the job running with the
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
+REQUEST BODY:
+{
+ "<first-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 3,
+ "upperBound": 5
+ }
+ },
+ "<second-vertex-id>": {
+ "parallelism": {
+ "lowerBound": 2,
+ "upperBound": 3
+ }
+ }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a
"re-scaling endpoint" and it introduces an important building block for
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref
"docs/deployment/overview" >}}/#session-mode), there are no guarantees
regarding the distribution of slots between multiple running jobs in the same
session, in case cluster doesn't have enough resources. The [External
Declarative Resource Management](#externalized-declarative-resource-management)
can partially mitigate this issue, but it is still recommended to use Adaptive
Scheduler on a [application cluster]({{< ref "docs/deployment/overview"
>}}/#application-mode).
+{{< /hint >}}
+
+The `jobmanager.scheduler` needs to be set to on the cluster level for the
adaptive scheduler to be used instead of default scheduler.
+
+```yaml
+jobmanager.scheduler: adaptive
+```
+
+The behavior of Adaptive Scheduler is configured by [all configuration options
prefixed with `jobmanager.adaptive-scheduler`]({{< ref
"docs/deployment/config">}}#advanced-scheduling-options) in their name.
+
+### Limitations
+
+- **Streaming jobs only**: The Adaptive Scheduler runs with streaming jobs
only. When submitting a batch job, Flink will use the default scheduler of
batch jobs, i.e. [Adaptive Batch Scheduler](#adaptive-batch-scheduler)
+- **No support for partial failover**: Partial failover means that the
scheduler is able to restart parts ("regions" in Flink's internals) of a failed
job, instead of the entire job. This limitation impacts only recovery time of
embarrassingly parallel jobs: Flink's default scheduler can restart failed
parts, while Adaptive Scheduler will restart the entire job.
+- Scaling events trigger job and task restarts, which will increase the number
of Task attempts.
+
+## Reactive Mode
+
+Reactive Mode is a special mode for Adaptive Scheduler, that assumes a single
job per-cluster (enforced by the [Application Mode]({{< ref
"docs/deployment/overview" >}}#application-mode)). Reactive Mode configures a
job so that it always uses all resources available in the cluster. Adding a
TaskManager will scale up your job, removing resources will scale it down.
Flink will manage the parallelism of the job, always setting it to the highest
possible values.
+
+Reactive Mode restarts a job on a rescaling event, restoring it from the
latest completed checkpoint. This means that there is no overhead of creating a
savepoint (which is needed for manually rescaling a job). Also, the amount of
data that is reprocessed after rescaling depends on the checkpointing interval,
and the restore time depends on the state size.
+
+The Reactive Mode allows Flink users to implement a powerful autoscaling
mechanism, by having an external service monitor certain metrics, such as
consumer lag, aggregate CPU utilization, throughput or latency. As soon as
these metrics are above or below a certain threshold, additional TaskManagers
can be added or removed from the Flink cluster. This could be implemented
through changing the [replica
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
of a Kubernetes deployment, or an [autoscaling
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
on AWS. This external service only needs to handle the resource allocation and
deallocation. Flink will take care of keeping the job running with the
resources available.
+
+localhost:1313/flink/flink-docs-master
Review Comment:
🙈
##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
# Elastic Scaling
-Apache Flink allows you to rescale your jobs. You can do this manually by
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either
not-catching up with inputs or over-provisioned. You can do this by manually
stopping the job and restarting from the savepoint created during shutdown with
a different parallelism.
-This page describes options where Flink automatically adjusts the parallelism
instead.
+This page describes options **where Flink automatically adjusts the
parallelism** instead, without relying on external orchestration.
Review Comment:
The previous versions wasn't great either, but combined with the above
statement this sounds very much like autoscaling which this obviously isn't.
(Although it maybe is for the adaptive batch scheduler).
It also conflicts a bit with EDRM which is all about external orchestration.
I think this whole section needs a rewrite.
```
The parallelism of a job has traditionally been a constant that is fixed
when the job was submitted.
Batch jobs couldn't be rescaled at all,
while Streaming jobs could be stopped with a savepoint and restarted with a
different parallelism.
This page describes features that allow the parallelism of a job to change
at runtime.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]