aljoscha commented on a change in pull request #10982: [FLINK-15683][docs] Restructure Configuration page URL: https://github.com/apache/flink/pull/10982#discussion_r373412691
########## File path: docs/ops/config.md ########## @@ -23,213 +23,407 @@ specific language governing permissions and limitations under the License. --> -**For single-node setups Flink is ready to go out of the box and you don't need to change the default configuration to get started.** +All configuration is done in `conf/flink-conf.yaml`, which is expected to be a flat collection of [YAML key value pairs](http://www.yaml.org/spec/1.2/spec.html) with format `key: value`. + +The configuration is parsed and evaluated when the Flink processes are started. Changes to the configuration file require restarting the relevant processes. The out of the box configuration will use your default Java installation. You can manually set the environment variable `JAVA_HOME` or the configuration key `env.java.home` in `conf/flink-conf.yaml` if you want to manually override the Java runtime to use. -This page lists the most common options that are typically needed to set up a well performing (distributed) installation. In addition a full list of all available configuration parameters is listed here. +* This will be replaced by the TOC +{:toc} -All configuration is done in `conf/flink-conf.yaml`, which is expected to be a flat collection of [YAML key value pairs](http://www.yaml.org/spec/1.2/spec.html) with format `key: value`. +# Basic Setup -The system and run scripts parse the config at startup time. Changes to the configuration file require restarting the Flink JobManager and TaskManagers. +The default configuration supports starting a single-node Flink session cluster without any changes. +The options in this section are the ones most commonly needed for a basic distributed Flink setup. -The configuration files for the TaskManagers can be different, Flink does not assume uniform machines in the cluster. +**Hostnames / Ports** -* This will be replaced by the TOC -{:toc} +These options are only necessary for a *standalone* application- or session deployments ([simple standalone]({{site.baseurl}}/ops/deployment/cluster_setup.html) or [Kubernetes]({{site.baseurl}}/ops/deployment/kubernetes.html)). -## Common Options +If you use Flink with [Yarn]({{site.baseurl}}/ops/deployment/yarn_setup.html), [Mesos]({{site.baseurl}}/ops/deployment/mesos.html), or the [*active* Kubernetes integration]({{site.baseurl}}/ops/deployment/native_kubernetes.html), the hostnames and ports get automatically configured are automatically discovered. -{% include generated/common_section.html %} + - `rest.address`, `rest.port`: These are used by the client to connect to Flink. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes) service in front of the Flink Master's REST interface. -## Full Reference + - The `jobmanager.rpc.address` (defaults to *"localhost"*) and `jobmanager.rpc.port` (defaults to *6123*) config entries are used by the TaskManager to connect to the JobManager/ResourceManager. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes internal) service for the Flink master (JobManager). This option is ignored on [setups with high-availability]({{site.baseurl}}/ops/jobmanager_high_availability.html) where the leader election mechanism is used to discover this automatically. -### Core +**Memory Sizes** -{% include generated/core_configuration.html %} +The default memory sizes support simple streaming/batch applications, but are too low to yield good performance on more complex applications. -### Execution + - `jobmanager.heap.size`: Sets the size of the *Flink Master* (JobManager / ResourceManager / Dispatcher) JVM heap. + - `taskmanager.memory.process.size`: Total size of the TaskManager process, including everything. Flink will subtract some memory for the JVM's own memory requirements (metaspace and others), and divide and configure the rest automatically between its components (network, managed memory, JVM Heap, etc.). -{% include generated/deployment_configuration.html %} -{% include generated/savepoint_config_configuration.html %} -{% include generated/execution_configuration.html %} +These value are configured as memory sizes, for example *1536m* or *2g*. + +**Parallelism** + + - `taskmanager.numberOfTaskSlots`: The number of slots that a TaskManager offers *(default: 1)*. Each slot can take one task or pipeline. + Having multiple slots in a TaskManager can help amortize certain constant overheads (of the JVM, application libraries, or network connections) across parallel tasks or pipelines. + + Running more smaller TaskManagers with one slot each is a good starting point and leads to the best isolation between tasks. Dedicating the same resources to fewer larger TaskManagers with more slots can help to increase resource utilization, at the cost of weaker isolation between the tasks (more tasks share the same JVM). + + - `parallelism.default`: The default parallelism used when no parallelism is specified anywhere *(default: 1)*. + +**Checkpointing** + +You can configure checkpointing directly in code within your Flink job or application. Putting these values here in the configuration defines them as defaults in case the application does not configure anything. + + - `state.backend`: The state backend to use. This defines the data structure mechanism for taking snapshots. Common values are `filesystem` or `rocksdb`. + - `state.checkpoints.dir`: The directory to write checkpoints to. This takes a path URI like *s3://mybucket/flink-app/checkpoints* or *hdfs://namenode:port/flink/checkpoints*. + - `state.savepoints.dir`: The default directory for savepoints. Takes a path URI, similar to `state.checkpoints.dir`. -### JobManager +**Web UI** -{% include generated/job_manager_configuration.html %} + - `web.submit.enable`: Enables uploading and starting jobs through the Flink UI *(true by default)*. Please note that even when this is disabled, session clusters still accept jobs through REST requests (HTTP calls). This flag only guards the feature to upload jobs in the UI. + - `web.upload.dir`: The directory where to store uploaded jobs. Only used when `web.submit.enable` is true. -### Restart Strategies +**Other** -Configuration options to control Flink's restart behaviour in case of job failures. + - `io.tmp.dirs`: The directories where Flink puts local data, defaults to the system temp directory (`java.io.tmpdir` property). If a list of directories is configured, Flink will rotate files across the directories. + + The data put in these directories include by default the files created by RocksDB, spilled intermediate results (batch algorithms), and cached jar files. + + This data is NOT relied upon for persistence/recovery, but if this data gets deleted, it typically causes a heavyweight recovery operation. It is hence recommended to set this to a directory that is not automatically periodically purged. + + Yarn, Mesos, and Kubernets setups automatically configure this value to the local working directories by default. Review comment: ```suggestion Yarn, Mesos, and Kubernetes setups automatically configure this value to the local working directories by default. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services