[GitHub] [flink] aljoscha commented on a change in pull request #10982: [FLINK-15683][docs] Restructure Configuration page

GitBox Fri, 31 Jan 2020 02:52:37 -0800

aljoscha commented on a change in pull request #10982: [FLINK-15683][docs] 
Restructure Configuration page
URL: https://github.com/apache/flink/pull/10982#discussion_r373412691


 ##########
 File path: docs/ops/config.md
 ##########
 @@ -23,213 +23,407 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-**For single-node setups Flink is ready to go out of the box and you don't 
need to change the default configuration to get started.**
+All configuration is done in `conf/flink-conf.yaml`, which is expected to be a 
flat collection of [YAML key value 
pairs](http://www.yaml.org/spec/1.2/spec.html) with format `key: value`.
+
+The configuration is parsed and evaluated when the Flink processes are 
started. Changes to the configuration file require restarting the relevant 
processes.
 
 The out of the box configuration will use your default Java installation. You 
can manually set the environment variable `JAVA_HOME` or the configuration key 
`env.java.home` in `conf/flink-conf.yaml` if you want to manually override the 
Java runtime to use.
 
-This page lists the most common options that are typically needed to set up a 
well performing (distributed) installation. In addition a full list of all 
available configuration parameters is listed here.
+* This will be replaced by the TOC
+{:toc}
 
-All configuration is done in `conf/flink-conf.yaml`, which is expected to be a 
flat collection of [YAML key value 
pairs](http://www.yaml.org/spec/1.2/spec.html) with format `key: value`.
+# Basic Setup
 
-The system and run scripts parse the config at startup time. Changes to the 
configuration file require restarting the Flink JobManager and TaskManagers.
+The default configuration supports starting a single-node Flink session 
cluster without any changes.
+The options in this section are the ones most commonly needed for a basic 
distributed Flink setup.
 
-The configuration files for the TaskManagers can be different, Flink does not 
assume uniform machines in the cluster.
+**Hostnames / Ports**
 
-* This will be replaced by the TOC
-{:toc}
+These options are only necessary for a *standalone* application- or session 
deployments ([simple 
standalone]({{site.baseurl}}/ops/deployment/cluster_setup.html) or 
[Kubernetes]({{site.baseurl}}/ops/deployment/kubernetes.html)).
 
-## Common Options
+If you use Flink with [Yarn]({{site.baseurl}}/ops/deployment/yarn_setup.html), 
[Mesos]({{site.baseurl}}/ops/deployment/mesos.html), or the [*active* 
Kubernetes 
integration]({{site.baseurl}}/ops/deployment/native_kubernetes.html), the 
hostnames and ports get automatically configured are automatically discovered.
 
-{% include generated/common_section.html %}
+  - `rest.address`, `rest.port`: These are used by the client to connect to 
Flink. Set this to the hostname where the master (JobManager) runs, or to the 
hostname of the (Kubernetes) service in front of the Flink Master's REST 
interface.
 
-## Full Reference
+  - The `jobmanager.rpc.address` (defaults to *"localhost"*) and 
`jobmanager.rpc.port` (defaults to *6123*) config entries are used by the 
TaskManager to connect to the JobManager/ResourceManager. Set this to the 
hostname where the master (JobManager) runs, or to the hostname of the 
(Kubernetes internal) service for the Flink master (JobManager). This option is 
ignored on [setups with 
high-availability]({{site.baseurl}}/ops/jobmanager_high_availability.html) 
where the leader election mechanism is used to discover this automatically.
 
-### Core
+**Memory Sizes** 
 
-{% include generated/core_configuration.html %}
+The default memory sizes support simple streaming/batch applications, but are 
too low to yield good performance on more complex applications.
 
-### Execution
+  - `jobmanager.heap.size`: Sets the size of the *Flink Master* (JobManager / 
ResourceManager / Dispatcher) JVM heap.
+  - `taskmanager.memory.process.size`: Total size of the TaskManager process, 
including everything. Flink will subtract some memory for the JVM's own memory 
requirements (metaspace and others), and divide and configure the rest 
automatically between its components (network, managed memory, JVM Heap, etc.).
 
-{% include generated/deployment_configuration.html %}
-{% include generated/savepoint_config_configuration.html %}
-{% include generated/execution_configuration.html %}
+These value are configured as memory sizes, for example *1536m* or *2g*.
+
+**Parallelism**
+
+  - `taskmanager.numberOfTaskSlots`: The number of slots that a TaskManager 
offers *(default: 1)*. Each slot can take one task or pipeline.
+    Having multiple slots in a TaskManager can help amortize certain constant 
overheads (of the JVM, application libraries, or network connections) across 
parallel tasks or pipelines.
+
+     Running more smaller TaskManagers with one slot each is a good starting 
point and leads to the best isolation between tasks. Dedicating the same 
resources to fewer larger TaskManagers with more slots can help to increase 
resource utilization, at the cost of weaker isolation between the tasks (more 
tasks share the same JVM).
+
+  - `parallelism.default`: The default parallelism used when no parallelism is 
specified anywhere *(default: 1)*.
+
+**Checkpointing**
+
+You can configure checkpointing directly in code within your Flink job or 
application. Putting these values here in the configuration defines them as 
defaults in case the application does not configure anything.
+
+  - `state.backend`: The state backend to use. This defines the data structure 
mechanism for taking snapshots. Common values are `filesystem` or `rocksdb`.
+  - `state.checkpoints.dir`: The directory to write checkpoints to. This takes 
a path URI like *s3://mybucket/flink-app/checkpoints* or 
*hdfs://namenode:port/flink/checkpoints*.
+  - `state.savepoints.dir`: The default directory for savepoints. Takes a path 
URI, similar to `state.checkpoints.dir`.
 
-### JobManager
+**Web UI**
 
-{% include generated/job_manager_configuration.html %}
+  - `web.submit.enable`: Enables uploading and starting jobs through the Flink 
UI *(true by default)*. Please note that even when this is disabled, session 
clusters still accept jobs through REST requests (HTTP calls). This flag only 
guards the feature to upload jobs in the UI.
+  - `web.upload.dir`: The directory where to store uploaded jobs. Only used 
when `web.submit.enable` is true.
 
-### Restart Strategies
+**Other**
 
-Configuration options to control Flink's restart behaviour in case of job 
failures.
+  - `io.tmp.dirs`: The directories where Flink puts local data, defaults to 
the system temp directory (`java.io.tmpdir` property). If a list of directories 
is configured, Flink will rotate files across the directories.
+    
+    The data put in these directories include by default the files created by 
RocksDB, spilled intermediate results (batch algorithms), and cached jar files.
+    
+    This data is NOT relied upon for persistence/recovery, but if this data 
gets deleted, it typically causes a heavyweight recovery operation. It is hence 
recommended to set this to a directory that is not automatically periodically 
purged.
+    
+    Yarn, Mesos, and Kubernets setups automatically configure this value to 
the local working directories by default.
 
 Review comment:
   ```suggestion
       Yarn, Mesos, and Kubernetes setups automatically configure this value to 
the local working directories by default.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #10982: [FLINK-15683][docs] Restructure Configuration page

Reply via email to