[ https://issues.apache.org/jira/browse/FLINK-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805601#comment-15805601 ]
ASF GitHub Bot commented on FLINK-5364: --------------------------------------- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3057#discussion_r94997957 --- Diff: docs/internals/flink_security.md --- @@ -24,64 +24,109 @@ specific language governing permissions and limitations under the License. --> -This document briefly describes how Flink security works in the context of various deployment mechanism (Standalone/Cluster vs YARN) -and the connectors that participates in Flink Job execution stage. This documentation can be helpful for both administrators and developers -who plans to run Flink on a secure environment. +This document briefly describes how Flink security works in the context of various deployment mechanisms (Standalone, YARN, or Mesos), +filesystems, connectors, and state backends. ## Objective +The primary goals of the Flink Kerberos security infrastructure are: +1. to enable secure data access for jobs within a cluster via connectors (e.g. Kafka) +2. to authenticate to ZooKeeper (if configured to use SASL) +3. to authenticate to Hadoop components (e.g. HDFS, HBase) -The primary goal of Flink security model is to enable secure data access for jobs within a cluster via connectors. In a production deployment scenario, -streaming jobs are understood to run for longer period of time (days/weeks/months) and the system must be able to authenticate against secure -data sources throughout the life of the job. The current implementation supports running Flink clusters (Job Manager/Task Manager/Jobs) under the -context of a Kerberos identity based on Keytab credential supplied during deployment time. Any jobs submitted will continue to run in the identity of the cluster. +In a production deployment scenario, streaming jobs are understood to run for long periods of time (days/weeks/months) and be able to authenticate to secure +data sources throughout the life of the job. Kerberos keytabs do not expire in that timeframe, unlike a Hadoop delegation token +or ticket cache entry. + +The current implementation supports running Flink clusters (Job Manager/Task Manager/jobs) with either a configured keytab credential +or with Hadoop delegation tokens. Keep in mind that all jobs share the credential configured for a given cluster. ## How Flink Security works -Flink deployment includes running Job Manager/ZooKeeper, Task Manager(s), Web UI and Job(s). Jobs (user code) can be submitted through web UI and/or CLI. -A Job program may use one or more connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.,) and each connector may have a specific security -requirements (Kerberos, database based, SSL/TLS, custom etc.,). While satisfying the security requirements for all the connectors evolves over a period -of time, at this time of writing, the following connectors/services are tested for Kerberos/Keytab based security. +In concept, a Flink program may use first- or third-party connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary authentication methods (Kerberos, SSL/TLS, username/password, etc.). While satisfying the security requirements for all connectors is an ongoing effort, +Flink provides first-class support for Kerberos authentication only. The following services and connectors are tested for Kerberos authentication: -- Kafka (0.9) +- Kafka (0.9+) - HDFS +- HBase - ZooKeeper -Hadoop uses the UserGroupInformation (UGI) class to manage security. UGI is a static implementation that takes care of handling Kerberos authentication. The Flink bootstrap implementation -(JM/TM/CLI) takes care of instantiating UGI with the appropriate security credentials to establish the necessary security context. +Note that it is possible to enable the use of Kerberos independently for each service or connector. For example, the user may enable +Hadoop security without necessitating the use of Kerberos for ZooKeeper, or vice versa. The shared element is the configuration of +Kerbreros credentials, which is then explicitly used by each component. + +The internal architecture is based on security modules (implementing `org.apache.flink.runtime.security.modules.SecurityModule`) which +are installed at startup. The next section describes each security module. + +### Hadoop Security Module +This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a process-wide *login user* context. The login user is +then used for all interactions with Hadoop, including HDFS, HBase, and YARN. + +If Hadoop security is enabled (in `core-site.xml`), the login user will have whatever Kerberos credential is configured. Otherwise, +the login user conveys only the user identity of the OS account that launched the cluster. + +### JAAS Security Module +This module provides a dynamic JAAS configuration to the cluster, making available the configured Kerberos credential to ZooKeeper, +Kafka, and other such components that rely on JAAS. + +Note that the user may also provide a static JAAS configuration file using the mechanisms described in the [Java SE Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html). Static entries override any +dynamic entries provided by this module. + +### ZooKeeper Security Module +This module configures certain process-wide ZooKeeper security-related settings, namely the ZooKeeper service name (default: `zookeeper`) +and the JAAS login context name (default: `Client`). + +## Security Configuration + +### Flink Configuration +The user's Kerberos ticket cache (managed with `kinit`) is used automatically, based on the following configuration option: + +- `security.kerberos.login.use-ticket-cache`: Indicates whether to read from the user's Kerberos ticket cache (default: `true`). + +A Kerberos keytab can be supplied by adding below configuration elements to the Flink configuration file: + +- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab file that contains the user credentials. + +- `security.kerberos.login.principal`: Kerberos principal name associated with the keytab. + +These configuration options establish a cluster-wide credential to be used in a Hadoop and/or JAAS context. Whether the credential is used in a Hadoop context is based on the Hadoop configuration (see next section). To be used in a JAAS context, the configuration specifies which JAAS *login contexts* (or *applications*) are enabled with the following configuration option: + +- `security.kerberos.login.contexts`: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, `Client` to use the credentials for ZooKeeper authentication). -Services like Kafka and ZooKeeper use SASL/JAAS based authentication mechanism to authenticate against a Kerberos server. It expects JAAS configuration with a platform-specific login -module *name* to be provided. Managing per-connector configuration files will be an overhead and to overcome this requirement, a process-wide JAAS configuration object is -instantiated which serves standard ApplicationConfigurationEntry for the connectors that authenticates using SASL/JAAS mechanism. +ZooKeeper-related configuration overrides: -It is important to understand that the Flink processes (JM/TM/UI/Jobs) itself uses UGI's doAS() implementation to run under a specific user context, i.e. if Hadoop security is enabled -then the Flink processes will be running under a secure user account or else it will run as the OS login user account who starts the Flink cluster. +- `zookeeper.sasl.service-name`: The Kerberos service name that the ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates mutual-authentication between the client (Flink) and server. -## Security Configurations +- `zookeeper.sasl.login-context-name`: The JAAS login context name that the ZooKeeper client uses to request the login context (default: `Client`). Should match +one of the values specified in `security.kerberos.login.contexts`. -Secure credentials can be supplied by adding below configuration elements to Flink configuration file: +### Hadoop Configuration -- `security.keytab`: Absolute path to Kerberos keytab file that contains the user credentials/secret. +The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment variable and by other means (see `org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`). The Kerberos credential (configured above) is used automatically if Hadoop security is enabled. -- `security.principal`: User principal name that the Flink cluster should run as. +Note that Kerberos credentials found in the ticket cache aren't transferrable to other hosts. In this scenario, the Flink CLI acquires Hadoop +delegation tokens (for HDFS and for HBase). -The delegation token mechanism (*kinit cache*) is still supported for backward compatibility but enabling security using *keytab* configuration is the preferred and recommended approach. +## Deployment Modes +Here is some information specific to each deployment mode. -## Standalone Mode: +### Standalone Mode Steps to run a secure Flink cluster in standalone/cluster mode: -- Add security configurations to Flink configuration file (on all cluster nodes) -- Make sure the Keytab file exist in the path as indicated in *security.keytab* configuration on all cluster nodes -- Deploy Flink cluster using cluster start/stop scripts or CLI +1. Add security-related configuration options to the Flink configuration file (on all cluster nodes). +2. Ensure that the keytab file exists at the path indicated by `security.kerberos.login.keytab` on all cluster nodes. +3. Deploy Flink cluster as normal. -## Yarn Mode: +### YARN/Mesos Mode -Steps to run secure Flink cluster in Yarn mode: -- Add security configurations to Flink configuration file (on the node from where cluster will be provisioned using Flink/Yarn CLI) -- Make sure the Keytab file exist in the path as indicated in *security.keytab* configuration -- Deploy Flink cluster using CLI +Steps to run a secure Flink cluster in YARN/Mesos mode: +1. Add security-related configuration options to the Flink configuration file on the client. +2. Ensure that the keytab file exists at the path as indicated by `security.kerberos.login.keytab` on the client node. +3. Deploy Flink cluster as normal. -In Yarn mode, the user supplied keytab will be copied over to the Yarn containers (App Master/JM and TM) as the Yarn local resource file. -Security implementation details are based on <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">Yarn security</a> +In YARN/Mesos mode, the keytab is automatically copied from the client to the Flink containers. -## Token Renewal +For more information, see <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">YARN security</a> documentation. -UGI and Kafka/ZK login module implementations takes care of auto-renewing the tickets upon reaching expiry and no further action is needed on the part of Flink. \ No newline at end of file +## Further Details +### Ticket Renewal +Each component that uses Kerberos is independently responsible for renewing the Kerberos TGT. Hadoop, ZooKeeper, and Kafka all do so, --- End diff -- I would add a sentence here that this requires specifying a keytab to work. Hadoop credentials will expire is only ticket cache / delegation tokens are used. > Rework JAAS configuration to support user-supplied entries > ---------------------------------------------------------- > > Key: FLINK-5364 > URL: https://issues.apache.org/jira/browse/FLINK-5364 > Project: Flink > Issue Type: Bug > Components: Cluster Management > Reporter: Eron Wright > Assignee: Eron Wright > Priority: Critical > Labels: kerberos, security > > Recent issues (see linked) have brought to light a critical deficiency in the > handling of JAAS configuration. > 1. the MapR distribution relies on an explicit JAAS conf, rather than > in-memory conf used by stock Hadoop. > 2. the ZK/Kafka/Hadoop security configuration is supposed to be independent > (one can enable each element separately) but isn't. > Perhaps we should rework the JAAS conf code to merge any user-supplied > configuration with our defaults, rather than using an all-or-nothing > approach. > We should also address some recent regressions: > 1. The HadoopSecurityContext should be installed regardless of auth mode, to > login with UserGroupInformation, which: > - handles the HADOOP_USER_NAME variable. > - installs an OS-specific user principal (from UnixLoginModule etc.) > unrelated to Kerberos. > - picks up the HDFS/HBASE delegation tokens. > 2. Fix the use of alternative authentication methods - delegation tokens and > Kerberos ticket cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)