gaborgsomogyi commented on a change in pull request #18746:
URL: https://github.com/apache/flink/pull/18746#discussion_r807711902



##########
File path: docs/content/docs/deployment/security/ssl.md
##########
@@ -0,0 +1,243 @@
+---
+title: "Encryption and Authentication using SSL"
+weight: 3
+type: docs
+aliases:
+  - /deployment/security/ssl.html
+  - /ops/security-ssl.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Encryption and Authentication using SSL
+
+Flink supports mutual authentication (when two parties authenticate each other 
at the same time) and 
+encryption of network communication with SSL for internal and external 
communication. 
+
+**By default, SSL/TLS authentication and encryption is not enabled** (to have 
defaults work out-of-the-box).
+
+This guide will explain internal vs external connectivity, and provide 
instructions on how to enable 
+SSL/TLS authentication and encryption for network communication with and 
between Flink processes. We 
+will go through steps such as generating certificates, setting up TrustStores 
and KeyStores, and 
+configuring cipher suites.
+
+For how-tos and tips for different deployment environments (i.e. standalone 
clusters, Kubernetes, YARN),
+check out the section on [Incorporating Security Features in a Running 
Cluster](#).
+
+## Internal and External Communication 
+
+There are two types of network connections to authenticate and encrypt: 
internal and external.
+
+{{< img src="/fig/ssl_internal_external.svg" alt="Internal and External 
Connectivity" width=75% >}}
+
+For more flexibility, security for internal and external connectivity can be 
enabled and configured
+separately.
+
+### Internal Connectivity
+
+Flink internal communication refers to all connections made between Flink 
processes. These include:
+
+- Control messages: RPC between JobManager / TaskManager / Dispatcher / 
ResourceManager
+- Transfers on the data plane: connections between TaskManagers to exchange 
data during shuffles, 
+  broadcasts, redistribution, etc
+- Blob service communication: distribution of libraries and other artifacts
+
+All internal connections are SSL authenticated and encrypted. The connections 
use **mutual authentication**,
+meaning both server and client side of each connection need to present the 
certificate to each other. 
+The certificate acts as a shared secret and can be embedded into container 
images or attached to your 
+deployment setup. These connections run Flink custom protocols. Users never 
connect directly to internal 
+connectivity endpoints.
+
+### External Connectivity
+
+Flink external communication refers to all connections made from the outside 
to Flink processes. 
+This includes: 
+- communication with the Dispatcher to submit Flink jobs (session clusters)
+- communication of the Flink CLI with the JobManager to inspect and modify a 
running Flink job/application
+
+Most of these connections are exposed via REST/HTTP endpoints (and used by the 
web UI). Some external 
+services used as sources or sinks may use some other network protocol.
+
+The server will, by default, accept connections from any client, meaning that 
the REST endpoint does 
+not authenticate the client. These REST endpoints, however, can be configured 
to require SSL encryption 
+and mutual authentication. 
+
+However, the recommended approach is setting up and configuring a dedicated 
proxy service (a "sidecar 
+proxy") that controls access to the REST endpoint. This involves binding the 
REST endpoint to the 
+loopback interface (or the pod-local interface in Kubernetes) and starting a 
REST proxy that authenticates 
+and forwards the requests to Flink. Examples for proxies that Flink users have 
deployed are [Envoy Proxy](https://www.envoyproxy.io/) 
+or [NGINX with 
MOD_AUTH](http://nginx.org/en/docs/http/ngx_http_auth_request_module.html).
+
+The rationale behind delegating authentication to a proxy is that such proxies 
offer a wide variety
+of authentication options and thus better integration into existing 
infrastructures.
+
+## Queryable State
+
+Connections to the [queryable state]({{< ref 
"docs/dev/datastream/fault-tolerance/queryable_state" >}}) 
+endpoints is currently not authenticated or encrypted.
+
+## SSL Setups
+
+{{< img src="/fig/ssl_mutual_auth.svg" alt="SSL Mutual Authentication" 
width=75% >}}
+
+Each participant has a keystore and a truststore, which are files. 
+
+A keystore contains a certificate (which contains a public key) and a private 
key. A truststore 
+contains trusted certificates and certificate chains/authorities. 
+
+Establishing encrypted, authenticated communication is a multi-step process, 
shown in the figure. 
+Certificates are exchanged and validated against the truststore, after which 
the two parties can 
+safely communicate.
+
+### Typical SSL Setup in Flink
+
+For mutually authenticated internal connections, note that:
+
+- a keystore and a truststore can contain the same dedicated certificate 
+- the same file can be used for both keystore and truststore
+- wildcard hostnames or addresses can be used 
+
+For internal communication between servers in a Flink cluster, a secure setup 
can be established with 
+a single, self-signed certificate that all parties use as both their keystore 
and truststore. You can 
+also use this approach for external communication when establishing mutual 
authentication for communication 
+between clients and the Flink Master.
+
+### Configuring Keystores and Truststores
+
+The SSL configuration requires configuring a keystore and a truststore such 
that the truststore trusts
+the keystore's certificate.
+
+You can use the [keytool 
utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html)
 
+to generate keys, certificates, keystores, and truststores:
+
+```bash
+    keytool -genkeypair -alias flink.internal -keystore internal.keystore \
+    -dname "CN=flink.internal" -storepass internal_store_password -keyalg RSA \
+    -keysize 4096 -storetype PKCS12
+```
+
+| Deployment mode        | How to add the files                                
                    |
+|------------------------|-------------------------------------------------------------------------|
+| Standalone clusters    | copy the files to each node, or add them to a 
shared mounted filesystem | 
+| Containerized clusters | add the files to the container images               
                    |
+| YARN                   | the cluster deployment phase can distribute these 
files                 |
+
+### Using Cipher Suites
+
+While the acts of encryption and decryption themselves are performed by keys, 
cipher suites outline
+the set of steps that the keys must follow to do so and the order in which 
these steps are executed.
+There are numerous cipher suites out there, each one with varying instructions 
on the encryption and
+decryption process.
+
+{{< hint warning >}}
+The [IETF RFC 7525](https://tools.ietf.org/html/rfc7525) recommends using a 
specific set of cipher
+suites for strong security. Since these cipher suites are not available on 
many setups out-of-the-box,
+Flink defaults to TLS_RSA_WITH_AES_128_CBC_SHA (a slightly weaker but more 
widely available cipher suite). 

Review comment:
       Well, this is compatible but weak in many ways. Including but not 
limited to:
   
   * In 2013, researchers demonstrated a timing attack against several TLS 
implementations using the CBC encryption algorithm (see 
[isg.rhul.ac.uk](http://www.isg.rhul.ac.uk/tls/Lucky13.html)). Additionally, 
the CBC mode is vulnerable to plain-text attacks in TLS 1.0, SSL 3.0 and lower. 
A fix has been introduced with TLS 1.2 in form of the GCM mode which is not 
vulnerable to the BEAST attack. GCM should be preferred over CBC.
   * The Secure Hash Algorithm 1 has been proven to be insecure as of 2017 (see 
[shattered.io](https://shattered.io/)).
   
   Defaulting to a weak suite makes users think that they're safe (which is not 
true) since we broadcast out-of-the-box configuration. Choosing a better suite 
requires all users to read this and act accordingly. If we can't come up w/ a 
better suite then maybe a warning can be printed that this is the default and 
weak...

##########
File path: docs/content/docs/deployment/security/kerberos.md
##########
@@ -0,0 +1,116 @@
+---
+title: Authentication with Kerberos
+weight: 2
+type: docs
+aliases:
+  - /deployment/security/kerberos.html
+  - /ops/security-kerberos.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Enabling and Configuring Authentication with Kerberos
+
+## What is Kerberos?
+
+[Kerberos](https://web.mit.edu/kerberos/) is a network authentication protocol 
that provides a secure, 
+single-sign-on, trusted, third-party mutual authentication service. It is 
designed to provide strong 
+authentication for client/server applications by using secret-key cryptography.
+
+## How the Flink Security Infrastructure works with Kerberos
+
+A Flink program may use first- or third-party connectors, necessitating 
arbitrary authentication methods 
+(Kerberos, SSL/TLS, username/password, etc.). While satisfying the security 
requirements for all connectors 
+is an ongoing effort, Flink provides first-class support for Kerberos 
authentication only.
+
+Kerberos can be used to authenticate connections to:
+- Hadoop and its components (YARN, HDFS, HBase)
+- ZooKeeper
+- Kafka (0.9+)
+
+The current implementation supports running Flink clusters (JobManager / 
TaskManager / Jobs) with two
+authentication modes: 
+- a configured [Kerberos 
keytab](https://web.mit.edu/kerberos/krb5-devel/doc/basic/keytab_def.html) 
credential
+- [Hadoop delegation 
tokens](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html)
+
+In production deployments, streaming jobs usually run for long periods of 
time. It is important to be 
+able to authenticate to secured data sources throughout the lifetime of the 
job. Kerberos keytabs are 
+the preferred authentication approach because they won't expire during the 
lifetime of long-running 
+stream processing applications, unlike a Hadoop delegation token or ticket 
cache entry.
+
+Note that the credentials are tied to a Flink cluster and not to a running 
job. Thus, all applications 
+that run on the same cluster use the same authentication token and all jobs 
within a cluster will share 
+the credentials configured for that cluster. If you need to work with 
different credentials, you should 
+start a new cluster. For example, to use a different keytab for a certain job, 
simply launch a separate 
+Flink cluster with a different configuration. Numerous Flink clusters may run 
side-by-side in a Kubernetes 
+or YARN environment.
+
+Note that it is possible to enable and configure the use of Kerberos 
independently for each service 
+or connector that is capable of being used with Kerberos. For example, you may 
enable Hadoop security 
+without enabling the use of Kerberos for ZooKeeper, or vice versa. 
+
+All services using Kerberos will use the same credentials. If you need to run 
some jobs with different 

Review comment:
       I don't want to be peckish so we can have it like that but there are 
edge cases where this is not true.
   Like Kafka delegation token can be provided from another subject and such 
case the provided cluster credentials are not used. Anyway, this is good as-is 
because that's more like a hack.

##########
File path: docs/content/docs/deployment/security/kerberos.md
##########
@@ -0,0 +1,116 @@
+---
+title: Authentication with Kerberos
+weight: 2
+type: docs
+aliases:
+  - /deployment/security/kerberos.html
+  - /ops/security-kerberos.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Enabling and Configuring Authentication with Kerberos
+
+## What is Kerberos?
+
+[Kerberos](https://web.mit.edu/kerberos/) is a network authentication protocol 
that provides a secure, 
+single-sign-on, trusted, third-party mutual authentication service. It is 
designed to provide strong 
+authentication for client/server applications by using secret-key cryptography.
+
+## How the Flink Security Infrastructure works with Kerberos
+
+A Flink program may use first- or third-party connectors, necessitating 
arbitrary authentication methods 
+(Kerberos, SSL/TLS, username/password, etc.). While satisfying the security 
requirements for all connectors 
+is an ongoing effort, Flink provides first-class support for Kerberos 
authentication only.
+
+Kerberos can be used to authenticate connections to:
+- Hadoop and its components (YARN, HDFS, HBase)
+- ZooKeeper
+- Kafka (0.9+)
+
+The current implementation supports running Flink clusters (JobManager / 
TaskManager / Jobs) with two
+authentication modes: 
+- a configured [Kerberos 
keytab](https://web.mit.edu/kerberos/krb5-devel/doc/basic/keytab_def.html) 
credential
+- [Hadoop delegation 
tokens](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html)
+
+In production deployments, streaming jobs usually run for long periods of 
time. It is important to be 
+able to authenticate to secured data sources throughout the lifetime of the 
job. Kerberos keytabs are 
+the preferred authentication approach because they won't expire during the 
lifetime of long-running 

Review comment:
       This is partially true. Keytabs are subject to any password expiration 
policies that may be imposed on a principal. Thus, if a principal's password 
expires (or the password is changed), a keytab generated using that password 
will be rendered invalid.
   

##########
File path: docs/content/docs/deployment/security/kerberos.md
##########
@@ -0,0 +1,116 @@
+---
+title: Authentication with Kerberos
+weight: 2
+type: docs
+aliases:
+  - /deployment/security/kerberos.html
+  - /ops/security-kerberos.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Enabling and Configuring Authentication with Kerberos
+
+## What is Kerberos?
+
+[Kerberos](https://web.mit.edu/kerberos/) is a network authentication protocol 
that provides a secure, 
+single-sign-on, trusted, third-party mutual authentication service. It is 
designed to provide strong 
+authentication for client/server applications by using secret-key cryptography.
+
+## How the Flink Security Infrastructure works with Kerberos
+
+A Flink program may use first- or third-party connectors, necessitating 
arbitrary authentication methods 
+(Kerberos, SSL/TLS, username/password, etc.). While satisfying the security 
requirements for all connectors 
+is an ongoing effort, Flink provides first-class support for Kerberos 
authentication only.
+
+Kerberos can be used to authenticate connections to:
+- Hadoop and its components (YARN, HDFS, HBase)
+- ZooKeeper
+- Kafka (0.9+)
+
+The current implementation supports running Flink clusters (JobManager / 
TaskManager / Jobs) with two
+authentication modes: 
+- a configured [Kerberos 
keytab](https://web.mit.edu/kerberos/krb5-devel/doc/basic/keytab_def.html) 
credential
+- [Hadoop delegation 
tokens](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html)
+
+In production deployments, streaming jobs usually run for long periods of 
time. It is important to be 
+able to authenticate to secured data sources throughout the lifetime of the 
job. Kerberos keytabs are 
+the preferred authentication approach because they won't expire during the 
lifetime of long-running 
+stream processing applications, unlike a Hadoop delegation token or ticket 
cache entry.
+
+Note that the credentials are tied to a Flink cluster and not to a running 
job. Thus, all applications 
+that run on the same cluster use the same authentication token and all jobs 
within a cluster will share 
+the credentials configured for that cluster. If you need to work with 
different credentials, you should 
+start a new cluster. For example, to use a different keytab for a certain job, 
simply launch a separate 
+Flink cluster with a different configuration. Numerous Flink clusters may run 
side-by-side in a Kubernetes 
+or YARN environment.
+
+Note that it is possible to enable and configure the use of Kerberos 
independently for each service 
+or connector that is capable of being used with Kerberos. For example, you may 
enable Hadoop security 
+without enabling the use of Kerberos for ZooKeeper, or vice versa. 
+
+All services using Kerberos will use the same credentials. If you need to run 
some jobs with different 
+Kerberos credentials, those jobs will have to run in a different cluster that 
is configured to use 
+those other credentials. For example, you can decide to use Kerberos for 
Hadoop security, but not for ZooKeeper.
+
+## Using Kerberos with Flink Security Modules
+
+The internal architecture of Flink security is based on _security modules_ 
(which implements 
[`org.apache.flink.runtime.security.modules.SecurityModule`](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/runtime/security/modules/SecurityModule.html)).
 
+These modules are installed at startup. 
+
+### Hadoop Security Module
+
+This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a 
process-wide *login user* 
+context. The login user is then used for all interactions with Hadoop, 
including HDFS, HBase, and YARN.
+
+If Hadoop security is enabled (in `core-site.xml`), the login user will have 
whatever Kerberos credential 
+is configured. Otherwise, the login user conveys only the user identity of the 
OS account that launched 
+the cluster.
+
+### JAAS Security Module
+
+This module provides a dynamic 
[JAAS](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaas/JAASRefGuide.html)
 

Review comment:
       Nit: Is there a specific reason why we use sometimes java 8 and 
sometimes 7 links (like couple of lines further).

##########
File path: docs/content/docs/deployment/security/kerberos.md
##########
@@ -0,0 +1,116 @@
+---
+title: Authentication with Kerberos
+weight: 2
+type: docs
+aliases:
+  - /deployment/security/kerberos.html
+  - /ops/security-kerberos.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Enabling and Configuring Authentication with Kerberos
+
+## What is Kerberos?
+
+[Kerberos](https://web.mit.edu/kerberos/) is a network authentication protocol 
that provides a secure, 
+single-sign-on, trusted, third-party mutual authentication service. It is 
designed to provide strong 
+authentication for client/server applications by using secret-key cryptography.
+
+## How the Flink Security Infrastructure works with Kerberos
+
+A Flink program may use first- or third-party connectors, necessitating 
arbitrary authentication methods 
+(Kerberos, SSL/TLS, username/password, etc.). While satisfying the security 
requirements for all connectors 
+is an ongoing effort, Flink provides first-class support for Kerberos 
authentication only.
+
+Kerberos can be used to authenticate connections to:
+- Hadoop and its components (YARN, HDFS, HBase)
+- ZooKeeper
+- Kafka (0.9+)
+
+The current implementation supports running Flink clusters (JobManager / 
TaskManager / Jobs) with two
+authentication modes: 
+- a configured [Kerberos 
keytab](https://web.mit.edu/kerberos/krb5-devel/doc/basic/keytab_def.html) 
credential
+- [Hadoop delegation 
tokens](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html)
+
+In production deployments, streaming jobs usually run for long periods of 
time. It is important to be 
+able to authenticate to secured data sources throughout the lifetime of the 
job. Kerberos keytabs are 
+the preferred authentication approach because they won't expire during the 
lifetime of long-running 
+stream processing applications, unlike a Hadoop delegation token or ticket 
cache entry.
+
+Note that the credentials are tied to a Flink cluster and not to a running 
job. Thus, all applications 
+that run on the same cluster use the same authentication token and all jobs 
within a cluster will share 
+the credentials configured for that cluster. If you need to work with 
different credentials, you should 
+start a new cluster. For example, to use a different keytab for a certain job, 
simply launch a separate 
+Flink cluster with a different configuration. Numerous Flink clusters may run 
side-by-side in a Kubernetes 
+or YARN environment.
+
+Note that it is possible to enable and configure the use of Kerberos 
independently for each service 
+or connector that is capable of being used with Kerberos. For example, you may 
enable Hadoop security 
+without enabling the use of Kerberos for ZooKeeper, or vice versa. 
+
+All services using Kerberos will use the same credentials. If you need to run 
some jobs with different 
+Kerberos credentials, those jobs will have to run in a different cluster that 
is configured to use 
+those other credentials. For example, you can decide to use Kerberos for 
Hadoop security, but not for ZooKeeper.
+
+## Using Kerberos with Flink Security Modules
+
+The internal architecture of Flink security is based on _security modules_ 
(which implements 
[`org.apache.flink.runtime.security.modules.SecurityModule`](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/runtime/security/modules/SecurityModule.html)).
 
+These modules are installed at startup. 
+
+### Hadoop Security Module
+
+This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a 
process-wide *login user* 
+context. The login user is then used for all interactions with Hadoop, 
including HDFS, HBase, and YARN.
+
+If Hadoop security is enabled (in `core-site.xml`), the login user will have 
whatever Kerberos credential 
+is configured. Otherwise, the login user conveys only the user identity of the 
OS account that launched 
+the cluster.
+
+### JAAS Security Module
+
+This module provides a dynamic 
[JAAS](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaas/JAASRefGuide.html)
 
+configuration to the cluster, making available the configured Kerberos 
credential to ZooKeeper, Kafka, 
+and other such components that rely on JAAS.
+
+Note that the user may also provide a static JAAS configuration file using the 
mechanisms described 
+in the [Java SE 
Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html).
   
+Static entries override any dynamic entries provided by this module.
+
+### ZooKeeper Security Module
+
+This module configures certain process-wide ZooKeeper security-related 
settings, namely the ZooKeeper 
+service name (default: `zookeeper`) and the JAAS login context name (default: 
`Client`).
+
+## Ticket Renewal
+
+A Ticket Granting Ticket (TGT) is a small, encrypted identification file with 
a limited validity period.

Review comment:
       TGT is simply not a file. A ticket cache may be file but can be keyring, 
api, etc... Of course Flink supports only file format of ticket cache.

##########
File path: docs/content/docs/deployment/security/kerberos.md
##########
@@ -0,0 +1,116 @@
+---
+title: Authentication with Kerberos
+weight: 2
+type: docs
+aliases:
+  - /deployment/security/kerberos.html
+  - /ops/security-kerberos.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Enabling and Configuring Authentication with Kerberos
+
+## What is Kerberos?
+
+[Kerberos](https://web.mit.edu/kerberos/) is a network authentication protocol 
that provides a secure, 
+single-sign-on, trusted, third-party mutual authentication service. It is 
designed to provide strong 
+authentication for client/server applications by using secret-key cryptography.
+
+## How the Flink Security Infrastructure works with Kerberos
+
+A Flink program may use first- or third-party connectors, necessitating 
arbitrary authentication methods 
+(Kerberos, SSL/TLS, username/password, etc.). While satisfying the security 
requirements for all connectors 
+is an ongoing effort, Flink provides first-class support for Kerberos 
authentication only.
+
+Kerberos can be used to authenticate connections to:
+- Hadoop and its components (YARN, HDFS, HBase)
+- ZooKeeper
+- Kafka (0.9+)
+
+The current implementation supports running Flink clusters (JobManager / 
TaskManager / Jobs) with two
+authentication modes: 
+- a configured [Kerberos 
keytab](https://web.mit.edu/kerberos/krb5-devel/doc/basic/keytab_def.html) 
credential
+- [Hadoop delegation 
tokens](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html)
+
+In production deployments, streaming jobs usually run for long periods of 
time. It is important to be 
+able to authenticate to secured data sources throughout the lifetime of the 
job. Kerberos keytabs are 
+the preferred authentication approach because they won't expire during the 
lifetime of long-running 
+stream processing applications, unlike a Hadoop delegation token or ticket 
cache entry.
+
+Note that the credentials are tied to a Flink cluster and not to a running 
job. Thus, all applications 
+that run on the same cluster use the same authentication token and all jobs 
within a cluster will share 
+the credentials configured for that cluster. If you need to work with 
different credentials, you should 
+start a new cluster. For example, to use a different keytab for a certain job, 
simply launch a separate 
+Flink cluster with a different configuration. Numerous Flink clusters may run 
side-by-side in a Kubernetes 
+or YARN environment.
+
+Note that it is possible to enable and configure the use of Kerberos 
independently for each service 
+or connector that is capable of being used with Kerberos. For example, you may 
enable Hadoop security 
+without enabling the use of Kerberos for ZooKeeper, or vice versa. 
+
+All services using Kerberos will use the same credentials. If you need to run 
some jobs with different 
+Kerberos credentials, those jobs will have to run in a different cluster that 
is configured to use 
+those other credentials. For example, you can decide to use Kerberos for 
Hadoop security, but not for ZooKeeper.
+
+## Using Kerberos with Flink Security Modules
+
+The internal architecture of Flink security is based on _security modules_ 
(which implements 
[`org.apache.flink.runtime.security.modules.SecurityModule`](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/runtime/security/modules/SecurityModule.html)).
 
+These modules are installed at startup. 
+
+### Hadoop Security Module
+
+This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a 
process-wide *login user* 
+context. The login user is then used for all interactions with Hadoop, 
including HDFS, HBase, and YARN.
+
+If Hadoop security is enabled (in `core-site.xml`), the login user will have 
whatever Kerberos credential 
+is configured. Otherwise, the login user conveys only the user identity of the 
OS account that launched 
+the cluster.
+
+### JAAS Security Module
+
+This module provides a dynamic 
[JAAS](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaas/JAASRefGuide.html)
 
+configuration to the cluster, making available the configured Kerberos 
credential to ZooKeeper, Kafka, 
+and other such components that rely on JAAS.
+
+Note that the user may also provide a static JAAS configuration file using the 
mechanisms described 
+in the [Java SE 
Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html).
   
+Static entries override any dynamic entries provided by this module.
+
+### ZooKeeper Security Module
+
+This module configures certain process-wide ZooKeeper security-related 
settings, namely the ZooKeeper 
+service name (default: `zookeeper`) and the JAAS login context name (default: 
`Client`).
+
+## Ticket Renewal
+
+A Ticket Granting Ticket (TGT) is a small, encrypted identification file with 
a limited validity period.
+The TGT file contains the session key, its expiration date, and the user's IP 
address.

Review comment:
       Just for my own understanding what do we mean under: `user's IP address`?
   I would understand it better what we mean here if you can point to the field 
in the fileformat here: 
https://web.mit.edu/kerberos/krb5-devel/doc/formats/ccache_file_format.html

##########
File path: docs/content/docs/deployment/security/running-cluster.md
##########
@@ -0,0 +1,231 @@
+---
+title: Incorporating Security Features in a Running Cluster
+weight: 4
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Incorporating Security Features in a Running Cluster
+
+This document briefly describes how Flink security works in the context of 
various deployment
+mechanisms (Standalone, native Kubernetes, YARN), filesystems, connectors, and 
state backends.
+
+## Deployment Modes
+
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes)
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on
+   all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### Native Kubernetes and YARN Mode
+
+Steps to run a secure Flink cluster in native Kubernetes and YARN mode:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on
+   the client node.
+3. Deploy Flink cluster as normal.
+
+In YARN and native Kubernetes mode, the keytab is automatically copied from 
the client to the Flink
+containers.
+
+To enable Kerberos authentication, the Kerberos configuration file is also 
required. This file can be
+either fetched from the cluster environment or uploaded by Flink. In the 
latter case, you need to

Review comment:
       How can I imagine this in real life: `fetched from the cluster 
environment`?
   Some users are let's say hard-coded `krb5.conf` files on cluster nodes. Is 
that what we mean here?

##########
File path: docs/content/docs/deployment/security/running-cluster.md
##########
@@ -0,0 +1,231 @@
+---
+title: Incorporating Security Features in a Running Cluster
+weight: 4
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Incorporating Security Features in a Running Cluster
+
+This document briefly describes how Flink security works in the context of 
various deployment
+mechanisms (Standalone, native Kubernetes, YARN), filesystems, connectors, and 
state backends.
+
+## Deployment Modes
+
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes)
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on
+   all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### Native Kubernetes and YARN Mode
+
+Steps to run a secure Flink cluster in native Kubernetes and YARN mode:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on
+   the client node.
+3. Deploy Flink cluster as normal.
+
+In YARN and native Kubernetes mode, the keytab is automatically copied from 
the client to the Flink
+containers.
+
+To enable Kerberos authentication, the Kerberos configuration file is also 
required. This file can be
+either fetched from the cluster environment or uploaded by Flink. In the 
latter case, you need to
+configure the `security.kerberos.krb5-conf.path` to indicate the path of the 
Kerberos configuration
+file and Flink will copy this file to its containers/pods.
+
+For more information, see the [documentation on YARN 
security](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md).
+
+#### Using `kinit` (YARN only)
+
+In YARN mode, it is possible to deploy a secure Flink cluster without a 
keytab, using only the ticket
+cache (as managed by `kinit`). This avoids the complexity of generating a 
keytab and avoids entrusting
+the cluster manager with it. In this scenario, the Flink CLI acquires Hadoop 
delegation tokens (for
+HDFS and for HBase). The main drawback is that the cluster is necessarily 
short-lived since the generated
+delegation tokens will expire (typically within a week).
+
+Steps to run a secure Flink cluster using `kinit`:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Login using the `kinit` command.
+3. Deploy Flink cluster as normal.
+
+
+## SSL - Tips for YARN Deployment
+
+For YARN, you can use the tools of Yarn to help:
+
+- Configuring security for internal communication is exactly the same as in 
the example above.
+
+- To secure the REST endpoint, you need to issue the REST endpoint's 
certificate such that it is
+  valid for all hosts that the JobManager may get deployed to. This can be 
done with a wild card
+  DNS name, or by adding multiple DNS names.
+
+- The easiest way to deploy keystores and truststore is by YARN client's *ship 
files* option (`-yt`).
+  Copy the keystore and truststore files into a local directory (say 
`deploy-keys/`) and start the
+  YARN session as follows: `flink run -m yarn-cluster -yt deploy-keys/ 
flinkapp.jar`
+
+- When deployed using YARN, Flink's web dashboard is accessible through YARN 
proxy's Tracking URL.
+  To ensure that the YARN proxy is able to access Flink's HTTPS URL, you need 
to configure YARN proxy
+  to accept Flink's SSL certificates.
+  For that, add the custom CA certificate into Java's default truststore on 
the YARN Proxy node.
+
+
+## Creating and Deploying Keystores and Truststores
+
+Keys, Certificates, and the Keystores and Truststores can be generated using 
the [keytool 
utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html).
+You need to have an appropriate Java Keystore and Truststore accessible from 
each node in the Flink cluster.
+
+- For standalone setups, this means copying the files to each node, or adding 
them to a shared mounted directory.
+- For container based setups, add the keystore and truststore files to the 
container images.
+- For Yarn setups, the cluster deployment phase can automatically distribute 
the keystore and truststore files.
+
+For the externally facing REST endpoint, the common name or subject 
alternative names in the certificate
+should match the node's hostname and IP address.
+
+## Example SSL Setup Standalone and Kubernetes
+
+**Internal Connectivity**
+
+Execute the following keytool commands to create a key pair in a keystore:
+
+```bash
+$ keytool -genkeypair \
+  -alias flink.internal \
+  -keystore internal.keystore \
+  -dname "CN=flink.internal" \
+  -storepass internal_store_password \
+  -keyalg RSA \
+  -keysize 4096 \
+  -storetype PKCS12
+```
+
+The single key/certificate in the keystore is used the same way by the server 
and client endpoints
+(mutual authentication). The key pair acts as the shared secret for internal 
security, and we can
+directly use it as keystore and truststore.
+
+```yaml
+security.ssl.internal.enabled: true
+security.ssl.internal.keystore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.truststore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.keystore-password: internal_store_password

Review comment:
       Nit: In order to increase the quality of the doc maybe we can mention in 
a central place that storing plaintext passwords in config files is weak. 
Better way is either to use K8S secrets or environment variables.

##########
File path: docs/content/docs/deployment/security/running-cluster.md
##########
@@ -0,0 +1,231 @@
+---
+title: Incorporating Security Features in a Running Cluster
+weight: 4
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Incorporating Security Features in a Running Cluster
+
+This document briefly describes how Flink security works in the context of 
various deployment
+mechanisms (Standalone, native Kubernetes, YARN), filesystems, connectors, and 
state backends.
+
+## Deployment Modes
+
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes)
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on
+   all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### Native Kubernetes and YARN Mode
+
+Steps to run a secure Flink cluster in native Kubernetes and YARN mode:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on
+   the client node.
+3. Deploy Flink cluster as normal.
+
+In YARN and native Kubernetes mode, the keytab is automatically copied from 
the client to the Flink
+containers.
+
+To enable Kerberos authentication, the Kerberos configuration file is also 
required. This file can be
+either fetched from the cluster environment or uploaded by Flink. In the 
latter case, you need to
+configure the `security.kerberos.krb5-conf.path` to indicate the path of the 
Kerberos configuration
+file and Flink will copy this file to its containers/pods.
+
+For more information, see the [documentation on YARN 
security](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md).
+
+#### Using `kinit` (YARN only)
+
+In YARN mode, it is possible to deploy a secure Flink cluster without a 
keytab, using only the ticket
+cache (as managed by `kinit`). This avoids the complexity of generating a 
keytab and avoids entrusting
+the cluster manager with it. In this scenario, the Flink CLI acquires Hadoop 
delegation tokens (for
+HDFS and for HBase). The main drawback is that the cluster is necessarily 
short-lived since the generated
+delegation tokens will expire (typically within a week).
+
+Steps to run a secure Flink cluster using `kinit`:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Login using the `kinit` command.
+3. Deploy Flink cluster as normal.
+
+
+## SSL - Tips for YARN Deployment
+
+For YARN, you can use the tools of Yarn to help:
+
+- Configuring security for internal communication is exactly the same as in 
the example above.
+
+- To secure the REST endpoint, you need to issue the REST endpoint's 
certificate such that it is
+  valid for all hosts that the JobManager may get deployed to. This can be 
done with a wild card
+  DNS name, or by adding multiple DNS names.
+
+- The easiest way to deploy keystores and truststore is by YARN client's *ship 
files* option (`-yt`).
+  Copy the keystore and truststore files into a local directory (say 
`deploy-keys/`) and start the
+  YARN session as follows: `flink run -m yarn-cluster -yt deploy-keys/ 
flinkapp.jar`
+
+- When deployed using YARN, Flink's web dashboard is accessible through YARN 
proxy's Tracking URL.
+  To ensure that the YARN proxy is able to access Flink's HTTPS URL, you need 
to configure YARN proxy
+  to accept Flink's SSL certificates.
+  For that, add the custom CA certificate into Java's default truststore on 
the YARN Proxy node.
+
+
+## Creating and Deploying Keystores and Truststores
+
+Keys, Certificates, and the Keystores and Truststores can be generated using 
the [keytool 
utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html).
+You need to have an appropriate Java Keystore and Truststore accessible from 
each node in the Flink cluster.
+
+- For standalone setups, this means copying the files to each node, or adding 
them to a shared mounted directory.
+- For container based setups, add the keystore and truststore files to the 
container images.
+- For Yarn setups, the cluster deployment phase can automatically distribute 
the keystore and truststore files.
+
+For the externally facing REST endpoint, the common name or subject 
alternative names in the certificate
+should match the node's hostname and IP address.
+
+## Example SSL Setup Standalone and Kubernetes
+
+**Internal Connectivity**
+
+Execute the following keytool commands to create a key pair in a keystore:
+
+```bash
+$ keytool -genkeypair \
+  -alias flink.internal \
+  -keystore internal.keystore \
+  -dname "CN=flink.internal" \
+  -storepass internal_store_password \
+  -keyalg RSA \
+  -keysize 4096 \
+  -storetype PKCS12
+```
+
+The single key/certificate in the keystore is used the same way by the server 
and client endpoints
+(mutual authentication). The key pair acts as the shared secret for internal 
security, and we can
+directly use it as keystore and truststore.
+
+```yaml
+security.ssl.internal.enabled: true
+security.ssl.internal.keystore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.truststore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.keystore-password: internal_store_password
+security.ssl.internal.truststore-password: internal_store_password
+security.ssl.internal.key-password: internal_store_password
+```
+
+**REST Endpoint**
+
+The REST endpoint may receive connections from external processes, including 
tools that are not part
+of Flink (for example curl request to the REST API). Setting up a proper 
certificate that is signed
+though a CA hierarchy may make sense for the REST endpoint.
+
+However, as mentioned above, the REST endpoint does not authenticate clients 
and thus typically needs
+to be secured via a proxy anyways.
+
+**REST Endpoint (simple self signed certificate)**
+
+This example shows how to create a simple keystore / truststore pair. The 
truststore does not contain
+the primary key and can be shared with other applications. In this example, 
*myhost.company.org / ip:10.0.2.15*
+is the node (or service) for the JobManager.
+
+```bash
+$ keytool -genkeypair -alias flink.rest -keystore rest.keystore -dname 
"CN=myhost.company.org" -ext "SAN=dns:myhost.company.org,ip:10.0.2.15" 
-storepass rest_keystore_password -keyalg RSA -keysize 4096 -storetype PKCS12

Review comment:
       Some of the keytool commands are broken with `\` some of not. It would 
be good to make it consistent because it's hard to read it.

##########
File path: docs/content/docs/deployment/security/running-cluster.md
##########
@@ -0,0 +1,231 @@
+---
+title: Incorporating Security Features in a Running Cluster
+weight: 4
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Incorporating Security Features in a Running Cluster
+
+This document briefly describes how Flink security works in the context of 
various deployment
+mechanisms (Standalone, native Kubernetes, YARN), filesystems, connectors, and 
state backends.
+
+## Deployment Modes
+
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes)
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on
+   all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### Native Kubernetes and YARN Mode

Review comment:
       Not sure where it belongs but there is a env var which has major effect 
on tokens and I don't see it anywhere. It would be good to mention it: 
`HADOOP_TOKEN_FILE_LOCATION`.

##########
File path: docs/content/docs/deployment/security/updates.md
##########
@@ -0,0 +1,73 @@
+---

Review comment:
       Here I have an offtopic question just for my own understanding. @zentol, 
how do we handle security fixes in general? A public list of fixed security 
issues help users to double check what is fixed or not but this is true for 
hackers as well. This is the reason why many projects are not creating security 
jiras + fixes are added in a hidden way sometimes in totally unrelated PRs.

##########
File path: docs/content/docs/deployment/security/ssl.md
##########
@@ -0,0 +1,243 @@
+---
+title: "Encryption and Authentication using SSL"
+weight: 3
+type: docs
+aliases:
+  - /deployment/security/ssl.html
+  - /ops/security-ssl.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Encryption and Authentication using SSL
+
+Flink supports mutual authentication (when two parties authenticate each other 
at the same time) and 
+encryption of network communication with SSL for internal and external 
communication. 
+
+**By default, SSL/TLS authentication and encryption is not enabled** (to have 
defaults work out-of-the-box).
+
+This guide will explain internal vs external connectivity, and provide 
instructions on how to enable 
+SSL/TLS authentication and encryption for network communication with and 
between Flink processes. We 
+will go through steps such as generating certificates, setting up TrustStores 
and KeyStores, and 
+configuring cipher suites.
+
+For how-tos and tips for different deployment environments (i.e. standalone 
clusters, Kubernetes, YARN),
+check out the section on [Incorporating Security Features in a Running 
Cluster](#).
+
+## Internal and External Communication 
+
+There are two types of network connections to authenticate and encrypt: 
internal and external.
+
+{{< img src="/fig/ssl_internal_external.svg" alt="Internal and External 
Connectivity" width=75% >}}
+
+For more flexibility, security for internal and external connectivity can be 
enabled and configured
+separately.
+
+### Internal Connectivity
+
+Flink internal communication refers to all connections made between Flink 
processes. These include:
+
+- Control messages: RPC between JobManager / TaskManager / Dispatcher / 
ResourceManager
+- Transfers on the data plane: connections between TaskManagers to exchange 
data during shuffles, 
+  broadcasts, redistribution, etc
+- Blob service communication: distribution of libraries and other artifacts
+
+All internal connections are SSL authenticated and encrypted. The connections 
use **mutual authentication**,
+meaning both server and client side of each connection need to present the 
certificate to each other. 
+The certificate acts as a shared secret and can be embedded into container 
images or attached to your 
+deployment setup. These connections run Flink custom protocols. Users never 
connect directly to internal 
+connectivity endpoints.
+
+### External Connectivity
+
+Flink external communication refers to all connections made from the outside 
to Flink processes. 
+This includes: 
+- communication with the Dispatcher to submit Flink jobs (session clusters)
+- communication of the Flink CLI with the JobManager to inspect and modify a 
running Flink job/application
+
+Most of these connections are exposed via REST/HTTP endpoints (and used by the 
web UI). Some external 
+services used as sources or sinks may use some other network protocol.
+
+The server will, by default, accept connections from any client, meaning that 
the REST endpoint does 
+not authenticate the client. These REST endpoints, however, can be configured 
to require SSL encryption 
+and mutual authentication. 
+
+However, the recommended approach is setting up and configuring a dedicated 
proxy service (a "sidecar 
+proxy") that controls access to the REST endpoint. This involves binding the 
REST endpoint to the 
+loopback interface (or the pod-local interface in Kubernetes) and starting a 
REST proxy that authenticates 
+and forwards the requests to Flink. Examples for proxies that Flink users have 
deployed are [Envoy Proxy](https://www.envoyproxy.io/) 
+or [NGINX with 
MOD_AUTH](http://nginx.org/en/docs/http/ngx_http_auth_request_module.html).
+
+The rationale behind delegating authentication to a proxy is that such proxies 
offer a wide variety
+of authentication options and thus better integration into existing 
infrastructures.
+
+## Queryable State
+
+Connections to the [queryable state]({{< ref 
"docs/dev/datastream/fault-tolerance/queryable_state" >}}) 
+endpoints is currently not authenticated or encrypted.
+
+## SSL Setups
+
+{{< img src="/fig/ssl_mutual_auth.svg" alt="SSL Mutual Authentication" 
width=75% >}}

Review comment:
       Personally I don't see the image, maybe local issue?!
   <img width="427" alt="Screenshot 2022-02-16 at 12 51 40" 
src="https://user-images.githubusercontent.com/18561820/154259238-2e7e4f58-5a1a-4083-9e7e-a3590ce8a088.png";>
   

##########
File path: docs/content/docs/deployment/security/overview.md
##########
@@ -0,0 +1,68 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Security Overview
+
+Frameworks that process data are sensitive components; you must use 
authentication and encryption to 
+secure your data and data sources. Apache Flink supports authentication with 
[Kerberos](https://web.mit.edu/kerberos/) 
+and can be configured to encrypt all network communication with 
[SSL](https://www.ssl.com/faqs/faq-what-is-ssl/).
+
+When we talk about security for Flink, we generally make a distinction between 
securing the internal 
+communication within the Flink cluster (i.e. between the Task Managers, 
between the Task Managers and 
+the Flink Master) and securing the external communication between the cluster 
and the outside world.
+
+Internally, netty is used for the TCP connections used for data exchange among 
the task managers, 
+and Akka is used for RPC between the Flink master and the task managers.
+
+Externally, HTTP is used for pretty much everything, except that some external 
services used as sources 
+or sinks may use some other network protocol.
+
+## What is supported?
+
+Security enhancement features by the Flink community make it easy to access 
secured data, protect 
+associated credentials, and increase overall security in a Flink cluster. The 
following security 
+measures are currently supported:
+
+- Authentication of connections between Flink processes 
+- Encryption of data transferred between Flink processes using SSL (Note that 
there is a performance 
+  degradation when SSL is enabled, the magnitude of which depends on the CPU 
type and the JVM implementation.)

Review comment:
       And heavily influenced by the key size.

##########
File path: docs/content/docs/deployment/security/overview.md
##########
@@ -0,0 +1,68 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Security Overview
+
+Frameworks that process data are sensitive components; you must use 
authentication and encryption to 
+secure your data and data sources. Apache Flink supports authentication with 
[Kerberos](https://web.mit.edu/kerberos/) 
+and can be configured to encrypt all network communication with 
[SSL](https://www.ssl.com/faqs/faq-what-is-ssl/).

Review comment:
       I know a normal user doesn't understand the difference between SSL and 
TLS. We use SSL in docs and configs for historical reasons (which is fine) but 
in the background do we enforce TLS? I'm asking it because SSL v3.0 deemed 
insecure in 2004 due to the POODLE attack.

##########
File path: docs/content/docs/deployment/security/running-cluster.md
##########
@@ -0,0 +1,231 @@
+---
+title: Incorporating Security Features in a Running Cluster
+weight: 4
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Incorporating Security Features in a Running Cluster
+
+This document briefly describes how Flink security works in the context of 
various deployment
+mechanisms (Standalone, native Kubernetes, YARN), filesystems, connectors, and 
state backends.
+
+## Deployment Modes
+
+Here is some information specific to each deployment mode.
+
+### Standalone Mode
+
+Steps to run a secure Flink cluster in standalone/cluster mode:
+
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes)
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on
+   all cluster nodes.
+3. Deploy Flink cluster as normal.
+
+### Native Kubernetes and YARN Mode
+
+Steps to run a secure Flink cluster in native Kubernetes and YARN mode:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on
+   the client node.
+3. Deploy Flink cluster as normal.
+
+In YARN and native Kubernetes mode, the keytab is automatically copied from 
the client to the Flink
+containers.
+
+To enable Kerberos authentication, the Kerberos configuration file is also 
required. This file can be
+either fetched from the cluster environment or uploaded by Flink. In the 
latter case, you need to
+configure the `security.kerberos.krb5-conf.path` to indicate the path of the 
Kerberos configuration
+file and Flink will copy this file to its containers/pods.
+
+For more information, see the [documentation on YARN 
security](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md).
+
+#### Using `kinit` (YARN only)
+
+In YARN mode, it is possible to deploy a secure Flink cluster without a 
keytab, using only the ticket
+cache (as managed by `kinit`). This avoids the complexity of generating a 
keytab and avoids entrusting
+the cluster manager with it. In this scenario, the Flink CLI acquires Hadoop 
delegation tokens (for
+HDFS and for HBase). The main drawback is that the cluster is necessarily 
short-lived since the generated
+delegation tokens will expire (typically within a week).
+
+Steps to run a secure Flink cluster using `kinit`:
+
+1. Add security-related configuration options to the Flink configuration file 
on the client
+   (see [here]({{< ref "docs/deployment/config" 
>}}#auth-with-external-systems)).
+2. Login using the `kinit` command.
+3. Deploy Flink cluster as normal.
+
+
+## SSL - Tips for YARN Deployment
+
+For YARN, you can use the tools of Yarn to help:
+
+- Configuring security for internal communication is exactly the same as in 
the example above.
+
+- To secure the REST endpoint, you need to issue the REST endpoint's 
certificate such that it is
+  valid for all hosts that the JobManager may get deployed to. This can be 
done with a wild card
+  DNS name, or by adding multiple DNS names.
+
+- The easiest way to deploy keystores and truststore is by YARN client's *ship 
files* option (`-yt`).
+  Copy the keystore and truststore files into a local directory (say 
`deploy-keys/`) and start the
+  YARN session as follows: `flink run -m yarn-cluster -yt deploy-keys/ 
flinkapp.jar`
+
+- When deployed using YARN, Flink's web dashboard is accessible through YARN 
proxy's Tracking URL.
+  To ensure that the YARN proxy is able to access Flink's HTTPS URL, you need 
to configure YARN proxy
+  to accept Flink's SSL certificates.
+  For that, add the custom CA certificate into Java's default truststore on 
the YARN Proxy node.
+
+
+## Creating and Deploying Keystores and Truststores
+
+Keys, Certificates, and the Keystores and Truststores can be generated using 
the [keytool 
utility](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html).
+You need to have an appropriate Java Keystore and Truststore accessible from 
each node in the Flink cluster.
+
+- For standalone setups, this means copying the files to each node, or adding 
them to a shared mounted directory.
+- For container based setups, add the keystore and truststore files to the 
container images.
+- For Yarn setups, the cluster deployment phase can automatically distribute 
the keystore and truststore files.
+
+For the externally facing REST endpoint, the common name or subject 
alternative names in the certificate
+should match the node's hostname and IP address.
+
+## Example SSL Setup Standalone and Kubernetes
+
+**Internal Connectivity**
+
+Execute the following keytool commands to create a key pair in a keystore:
+
+```bash
+$ keytool -genkeypair \
+  -alias flink.internal \
+  -keystore internal.keystore \
+  -dname "CN=flink.internal" \
+  -storepass internal_store_password \
+  -keyalg RSA \
+  -keysize 4096 \
+  -storetype PKCS12
+```
+
+The single key/certificate in the keystore is used the same way by the server 
and client endpoints
+(mutual authentication). The key pair acts as the shared secret for internal 
security, and we can
+directly use it as keystore and truststore.
+
+```yaml
+security.ssl.internal.enabled: true
+security.ssl.internal.keystore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.truststore: /path/to/flink/conf/internal.keystore
+security.ssl.internal.keystore-password: internal_store_password
+security.ssl.internal.truststore-password: internal_store_password
+security.ssl.internal.key-password: internal_store_password
+```
+
+**REST Endpoint**
+
+The REST endpoint may receive connections from external processes, including 
tools that are not part
+of Flink (for example curl request to the REST API). Setting up a proper 
certificate that is signed
+though a CA hierarchy may make sense for the REST endpoint.
+
+However, as mentioned above, the REST endpoint does not authenticate clients 
and thus typically needs
+to be secured via a proxy anyways.
+
+**REST Endpoint (simple self signed certificate)**
+
+This example shows how to create a simple keystore / truststore pair. The 
truststore does not contain
+the primary key and can be shared with other applications. In this example, 
*myhost.company.org / ip:10.0.2.15*

Review comment:
       You mean private key instead of primary, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to