rmetzger commented on a change in pull request #15355:
URL: https://github.com/apache/flink/pull/15355#discussion_r600803890
##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -100,13 +110,40 @@ Since Reactive Mode is a new, experimental feature, not
all features supported b
- **Deployment is only supported as a standalone application deployment**.
Active resource providers (such as native Kubernetes, YARN or Mesos) are
explicitly not supported. Standalone session clusters are not supported either.
The application deployment is limited to single job applications.
The only supported deployment options are [Standalone in Application
Mode]({{< ref "docs/deployment/resource-providers/standalone/overview"
>}}#application-mode) ([described](#getting-started) on this page), [Docker in
Application Mode]({{< ref
"docs/deployment/resource-providers/standalone/docker"
>}}#application-mode-on-docker) and [Standalone Kubernetes Application
Cluster]({{< ref "docs/deployment/resource-providers/standalone/kubernetes"
>}}#deploy-application-cluster).
-- **Streaming jobs only**: The first version of Reactive Mode runs with
streaming jobs only. When submitting a batch job, then the default scheduler
will be used.
-- **No support for [local recovery]({{< ref
"docs/ops/state/large_state_tuning">}}#task-local-recovery)**: Local recovery
is a feature that schedules tasks to machines so that the state on that machine
gets re-used if possible. The lack of this feature means that Reactive Mode
will always need to download the entire state from the checkpoint storage.
-- **No support for local failover**: Local failover means that the scheduler
is able to restart parts ("regions" in Flink's internals) of a failed job,
instead of the entire job. This limitation impacts only recovery time of
embarrassingly parallel jobs: Flink's default scheduler can restart failed
parts, while Reactive Mode will restart the entire job.
-- **Limited integration with Flink's Web UI**: Reactive Mode allows that a
job's parallelism can change over its lifetime. The web UI only shows the
current parallelism the job.
-- **Limited Job metrics**: With the exception of `numRestarts` all
[availability]({{< ref "docs/ops/metrics" >}}#availability) and
[checkpointing]({{< ref "docs/ops/metrics" >}}#checkpointing) metrics with the
`Job` scope are not working correctly.
+The [limitations of Adaptive Scheduler](#limitations-1) also apply to Reactive
Mode.
+
+
+## Adaptive Scheduler
+
+{{< hint danger >}}
+Using Adaptive Scheduler directly (not through Reactive Mode) is only advised
for advanced users.
+{{< /hint >}}
+
+Adaptive Scheduler is a scheduler that can adjust the parallelism of a job
based on the available slots. On start up, it requests the number of slots
needed based on the parallelisms configured by the user in the streaming job.
If the number of slots offered is lower than the requested slots, Adaptive
Scheduler will reduce the parallelism so that it can start executing the job
(or fail if insufficient slots are available). In Reactive Mode (see above) the
parallelism requested is conceptually set to infinity, letting the job always
use as many resources as possible. You can also use Adaptive Scheduler without
Reactive Mode, but there are some practical limitations:
+- If you are using Adaptive Scheduler on a session cluster, there are no
guarantees regarding the distribution of slots between multiple running jobs in
the same session.
+- An active resource manager (native Kubernetes, YARN, Mesos) will request
TaskManagers until the parallelism requested by the job is fulfilled,
potentially allocating a lot of resources.
+One benefit of Adpative Scheduler over the default scheduler is that it can
handle TaskManager losses gracefully, since it would just scale down in these
cases.
+
+### Usage
+
+The following configuration parameters need to be set:
+
+- `jobmanager.scheduler: adaptive`: Change from the default scheduler to
adaptive scheduler
+- `cluster.declarative-resource-management.enabled` Declarative resource
management must be enabled (enabled by default).
+
+Depending on your usage scenario, we also recommend adjusting the parallelism
of the job you are submitting to the adaptive scheduler. The parallelism
configured determines the number of slots Adaptive Scheduler will request.
Review comment:
from the introduction of adaptive scheduler, it is probably clear what
the parallelism does. I'll remove that sentence.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]