Re: [PR] [FLINK-32671] Document Externalized Declararative Resource Management… [flink]

via GitHub Mon, 23 Oct 2023 03:50:27 -0700


zentol commented on code in PR #23570:
URL: https://github.com/apache/flink/pull/23570#discussion_r1368477395



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
 
 # Elastic Scaling
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either 
not-catching up with inputs or over-provisioned. You can do this by manually 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+This page describes options **where Flink automatically adjusts the 
parallelism** instead, without relying on external orchestration.
 
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive 
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch 
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available 
slots. It will automatically reduce the parallelism if not enough slots are 
available to run the job with the originally configured parallelism; be it due 
to not enough resources being available at the time of submission, or 
TaskManager outages during the job execution. If new slots become available the 
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated 
as if it was set to infinity, letting the job always use as many resources as 
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it 
can handle TaskManager losses gracefully, since it would just scale down in 
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource 
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
 As you can see, instead of asking for the exact number of slots, JobMaster 
declares its desired resources (for reactive mode the maximum is set to 
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically 
rescale the job using the latest available savepoint, eliminating the need for 
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job 
starts and cannot be changed during the runtime. This means that the Adaptive 
Scheduler cannot handle cases where the job needs to be rescaled due to a 
change in the input rate, or a change in the performance of the workload. 
Starting from **Flink 1.18.x**, this is addressed by [External Declarative 
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable 
product") feature. The Flink community is actively looking for feedback by 
users through our mailing lists. Please check the limitations listed on this 
page.
+{{< /hint >}}
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache 
Flink Kubernetes 
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
 for a fully-fledged auto-scaling experience.
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment 
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for 
resources, and you need a finer-grained control over the distribution of 
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active 
Resource Manager (e.g. [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on 
Flink to "greedily" spawn new TaskManagers, but you still want to leverage 
rescaling capabilities as with [Reactive Mode](#reactive-mode).
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api" 
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource 
requirements of a running job, by setting per-vertex parallelism boundaries.
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
  
+REQUEST BODY:
+{
+    "<first-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 3,
+            "upperBound": 5
+        }
+    },
+    "<second-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 2,
+            "upperBound": 3
+        }
+    }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a 
"re-scaling endpoint" and it introduces an important building block for 
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the 
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref 
"docs/deployment/overview" >}}/#session-mode), there are no guarantees 
regarding the distribution of slots between multiple running jobs in the same 
session, in case cluster doesn't have enough resources. The [External 
Declarative Resource Management](#externalized-declarative-resource-management) 
can partially mitigate this issue, but it is still recommended to use Adaptive 
Scheduler on a [application cluster]({{< ref "docs/deployment/overview" 
>}}/#application-mode).
+{{< /hint >}}
+
+The `jobmanager.scheduler` needs to be set to on the cluster level for the 
adaptive scheduler to be used instead of default scheduler.
+
+```yaml
+jobmanager.scheduler: adaptive
+```
+
+The behavior of Adaptive Scheduler is configured by [all configuration options 
prefixed with `jobmanager.adaptive-scheduler`]({{< ref 
"docs/deployment/config">}}#advanced-scheduling-options) in their name.
+
+### Limitations
+
+- **Streaming jobs only**: The Adaptive Scheduler runs with streaming jobs 
only. When submitting a batch job, Flink will use the default scheduler of 
batch jobs, i.e. [Adaptive Batch Scheduler](#adaptive-batch-scheduler)
+- **No support for partial failover**: Partial failover means that the 
scheduler is able to restart parts ("regions" in Flink's internals) of a failed 
job, instead of the entire job. This limitation impacts only recovery time of 
embarrassingly parallel jobs: Flink's default scheduler can restart failed 
parts, while Adaptive Scheduler will restart the entire job.
+- Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
+
+## Reactive Mode
+
+Reactive Mode is a special mode for Adaptive Scheduler, that assumes a single 
job per-cluster (enforced by the [Application Mode]({{< ref 
"docs/deployment/overview" >}}#application-mode)). Reactive Mode configures a 
job so that it always uses all resources available in the cluster. Adding a 
TaskManager will scale up your job, removing resources will scale it down. 
Flink will manage the parallelism of the job, always setting it to the highest 
possible values.
+
+Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size.
+
+The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
+
+localhost:1313/flink/flink-docs-master

Review Comment:
   🙈 



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
 
 # Elastic Scaling
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either 
not-catching up with inputs or over-provisioned. You can do this by manually 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+This page describes options **where Flink automatically adjusts the 
parallelism** instead, without relying on external orchestration.
 
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive 
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch 
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available 
slots. It will automatically reduce the parallelism if not enough slots are 
available to run the job with the originally configured parallelism; be it due 
to not enough resources being available at the time of submission, or 
TaskManager outages during the job execution. If new slots become available the 
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated 
as if it was set to infinity, letting the job always use as many resources as 
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it 
can handle TaskManager losses gracefully, since it would just scale down in 
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource 
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
 As you can see, instead of asking for the exact number of slots, JobMaster 
declares its desired resources (for reactive mode the maximum is set to 
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically 
rescale the job using the latest available savepoint, eliminating the need for 
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job 
starts and cannot be changed during the runtime. This means that the Adaptive 
Scheduler cannot handle cases where the job needs to be rescaled due to a 
change in the input rate, or a change in the performance of the workload. 
Starting from **Flink 1.18.x**, this is addressed by [External Declarative 
Resource Management](#externalized-declarative-resource-management).

Review Comment:
   We can phrase this more positively. "Since 1.18.x you are also able to 
change the target parallelism via [EDRM..., effectively allowing you to rescale 
jobs at runtime."



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
 
 # Elastic Scaling
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either 
not-catching up with inputs or over-provisioned. You can do this by manually 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+This page describes options **where Flink automatically adjusts the 
parallelism** instead, without relying on external orchestration.
 
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive 
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch 
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available 
slots. It will automatically reduce the parallelism if not enough slots are 
available to run the job with the originally configured parallelism; be it due 
to not enough resources being available at the time of submission, or 
TaskManager outages during the job execution. If new slots become available the 
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated 
as if it was set to infinity, letting the job always use as many resources as 
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it 
can handle TaskManager losses gracefully, since it would just scale down in 
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource 
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
 As you can see, instead of asking for the exact number of slots, JobMaster 
declares its desired resources (for reactive mode the maximum is set to 
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically 
rescale the job using the latest available savepoint, eliminating the need for 
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job 
starts and cannot be changed during the runtime. This means that the Adaptive 
Scheduler cannot handle cases where the job needs to be rescaled due to a 
change in the input rate, or a change in the performance of the workload. 
Starting from **Flink 1.18.x**, this is addressed by [External Declarative 
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable 
product") feature. The Flink community is actively looking for feedback by 
users through our mailing lists. Please check the limitations listed on this 
page.
+{{< /hint >}}
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache 
Flink Kubernetes 
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
 for a fully-fledged auto-scaling experience.
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment 
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for 
resources, and you need a finer-grained control over the distribution of 
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active 
Resource Manager (e.g. [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on 
Flink to "greedily" spawn new TaskManagers, but you still want to leverage 
rescaling capabilities as with [Reactive Mode](#reactive-mode).
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api" 
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource 
requirements of a running job, by setting per-vertex parallelism boundaries.
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
  
+REQUEST BODY:
+{
+    "<first-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 3,
+            "upperBound": 5
+        }
+    },
+    "<second-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 2,
+            "upperBound": 3
+        }
+    }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a 
"re-scaling endpoint" and it introduces an important building block for 
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the 
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref 
"docs/deployment/overview" >}}/#session-mode), there are no guarantees 
regarding the distribution of slots between multiple running jobs in the same 
session, in case cluster doesn't have enough resources. The [External 
Declarative Resource Management](#externalized-declarative-resource-management) 
can partially mitigate this issue, but it is still recommended to use Adaptive 
Scheduler on a [application cluster]({{< ref "docs/deployment/overview" 
>}}/#application-mode).

Review Comment:
   ```suggestion
   If you are using Adaptive Scheduler on a [session cluster]({{< ref 
"docs/deployment/overview" >}}/#session-mode), there are no guarantees 
regarding the distribution of slots between multiple running jobs in the same 
session, in case the cluster doesn't have enough resources. The [External 
Declarative Resource Management](#externalized-declarative-resource-management) 
can partially mitigate this issue, but it is still recommended to use Adaptive 
Scheduler on a [application cluster]({{< ref "docs/deployment/overview" 
>}}/#application-mode).
   ```



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
 
 # Elastic Scaling
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either 
not-catching up with inputs or over-provisioned. You can do this by manually 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+This page describes options **where Flink automatically adjusts the 
parallelism** instead, without relying on external orchestration.
 
-## Reactive Mode
+Elastic scaling is unlocked by two new schedulers: [Adaptive 
Scheduler](#adaptive-scheduler) (streaming) and [Adaptive Batch 
Scheduler](#adaptive-batch-scheduler) (batch).
+
+## Adaptive Scheduler
+
+The Adaptive Scheduler can adjust the parallelism of a job based on available 
slots. It will automatically reduce the parallelism if not enough slots are 
available to run the job with the originally configured parallelism; be it due 
to not enough resources being available at the time of submission, or 
TaskManager outages during the job execution. If new slots become available the 
job will be scaled up again, up to the configured parallelism.
+
+In Reactive Mode (see below) the configured parallelism is ignored and treated 
as if it was set to infinity, letting the job always use as many resources as 
possible.
+
+One benefit of the Adaptive Scheduler over the default scheduler is that it 
can handle TaskManager losses gracefully, since it would just scale down in 
these cases.
+
+{{< img src="/fig/adaptive_scheduler.png" >}}
+
+Adaptive Scheduler builds on top of a feature called [Declarative Resource 
Management](https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management).
 As you can see, instead of asking for the exact number of slots, JobMaster 
declares its desired resources (for reactive mode the maximum is set to 
infinity) to the ResourceManager, which then tries to fulfill those resources.
+
+{{< img src="/fig/adaptive_scheduler_rescale.png" >}}
+
+When JobMaster gets more resources during the runtime, it will automatically 
rescale the job using the latest available savepoint, eliminating the need for 
an external orchestration.
+
+The obvious drawback here is, that the resources are declared once the job 
starts and cannot be changed during the runtime. This means that the Adaptive 
Scheduler cannot handle cases where the job needs to be rescaled due to a 
change in the input rate, or a change in the performance of the workload. 
Starting from **Flink 1.18.x**, this is addressed by [External Declarative 
Resource Management](#externalized-declarative-resource-management).
+
+### Externalized Declarative Resource Management
+
+{{< hint warning >}}
+Externalized Declarative Resource Management is an MVP ("minimum viable 
product") feature. The Flink community is actively looking for feedback by 
users through our mailing lists. Please check the limitations listed on this 
page.
+{{< /hint >}}
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+You can use Externalized Declarative Resource Management with the [Apache 
Flink Kubernetes 
operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/#flink-118-and-in-place-scaling-support)
 for a fully-fledged auto-scaling experience.
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+Externalized Declarative Resource Management aims to address two deployment 
scenarios:
+1. Adaptive Scheduler on Session Cluster, where multiple jobs can compete for 
resources, and you need a finer-grained control over the distribution of 
resources between jobs.
+2. Adaptive Scheduler on Application Cluster in combination with Active 
Resource Manager (e.g. [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})), where you rely on 
Flink to "greedily" spawn new TaskManagers, but you still want to leverage 
rescaling capabilities as with [Reactive Mode](#reactive-mode).
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+by introducing a new [REST API endpoint]({{< ref "docs/ops/rest_api" 
>}}#jobs-jobid-resource-requirements-1), that allows you to re-declare resource 
requirements of a running job, by setting per-vertex parallelism boundaries.
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
+```
+PUT /jobs/<job-id>/resource-requirements
  
+REQUEST BODY:
+{
+    "<first-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 3,
+            "upperBound": 5
+        }
+    },
+    "<second-vertex-id>": {
+        "parallelism": {
+            "lowerBound": 2,
+            "upperBound": 3
+        }
+    }
+}
+```
+
+To a certain extent, the above endpoint could be thought about as a 
"re-scaling endpoint" and it introduces an important building block for 
building an auto-scaling experience for Flink.
+
+You can manually try this feature out, by navigating the Job overview in the 
Flink UI and using up-scale/down-scale buttons in the task list.
+
+### Usage
+
+{{< hint info >}}
+If you are using Adaptive Scheduler on a [session cluster]({{< ref 
"docs/deployment/overview" >}}/#session-mode), there are no guarantees 
regarding the distribution of slots between multiple running jobs in the same 
session, in case cluster doesn't have enough resources. The [External 
Declarative Resource Management](#externalized-declarative-resource-management) 
can partially mitigate this issue, but it is still recommended to use Adaptive 
Scheduler on a [application cluster]({{< ref "docs/deployment/overview" 
>}}/#application-mode).
+{{< /hint >}}
+
+The `jobmanager.scheduler` needs to be set to on the cluster level for the 
adaptive scheduler to be used instead of default scheduler.
+
+```yaml
+jobmanager.scheduler: adaptive
+```
+
+The behavior of Adaptive Scheduler is configured by [all configuration options 
prefixed with `jobmanager.adaptive-scheduler`]({{< ref 
"docs/deployment/config">}}#advanced-scheduling-options) in their name.
+
+### Limitations
+
+- **Streaming jobs only**: The Adaptive Scheduler runs with streaming jobs 
only. When submitting a batch job, Flink will use the default scheduler of 
batch jobs, i.e. [Adaptive Batch Scheduler](#adaptive-batch-scheduler)
+- **No support for partial failover**: Partial failover means that the 
scheduler is able to restart parts ("regions" in Flink's internals) of a failed 
job, instead of the entire job. This limitation impacts only recovery time of 
embarrassingly parallel jobs: Flink's default scheduler can restart failed 
parts, while Adaptive Scheduler will restart the entire job.
+- Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
+
+## Reactive Mode
+
+Reactive Mode is a special mode for Adaptive Scheduler, that assumes a single 
job per-cluster (enforced by the [Application Mode]({{< ref 
"docs/deployment/overview" >}}#application-mode)). Reactive Mode configures a 
job so that it always uses all resources available in the cluster. Adding a 
TaskManager will scale up your job, removing resources will scale it down. 
Flink will manage the parallelism of the job, always setting it to the highest 
possible values.
+
+Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size.
+
+The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
+
+localhost:1313/flink/flink-docs-master

Review Comment:
   🙈 



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -25,22 +25,100 @@ under the License.
 
 # Elastic Scaling
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+Apache Flink allows you to rescale your jobs when the job is either 
not-catching up with inputs or over-provisioned. You can do this by manually 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+This page describes options **where Flink automatically adjusts the 
parallelism** instead, without relying on external orchestration.

Review Comment:
   The previous versions wasn't great either, but combined with the above 
statement this sounds very much like autoscaling which this obviously isn't. 
(Although it maybe is for the adaptive batch scheduler).
   
   It also conflicts a bit with EDRM which is all about external orchestration.
   
   I think this whole section needs a rewrite.
   
   ```
   The parallelism of a job has traditionally been a constant that is fixed 
when the job was submitted.
   Batch jobs couldn't be rescaled at all,
   while Streaming jobs could be stopped with a savepoint and restarted with a 
different parallelism.
   
   This page describes features that allow the parallelism of a job to change 
at runtime.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-32671] Document Externalized Declararative Resource Management… [flink]

Reply via email to