[jira] [Commented] (FLINK-5364) Rework JAAS configuration to support user-supplied entries

ASF GitHub Bot (JIRA) Fri, 06 Jan 2017 12:09:23 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805601#comment-15805601
 ]


ASF GitHub Bot commented on FLINK-5364:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3057#discussion_r94997957
  
    --- Diff: docs/internals/flink_security.md ---
    @@ -24,64 +24,109 @@ specific language governing permissions and limitations
     under the License.
     -->
     
    -This document briefly describes how Flink security works in the context of 
various deployment mechanism (Standalone/Cluster vs YARN) 
    -and the connectors that participates in Flink Job execution stage. This 
documentation can be helpful for both administrators and developers 
    -who plans to run Flink on a secure environment.
    +This document briefly describes how Flink security works in the context of 
various deployment mechanisms (Standalone, YARN, or Mesos), 
    +filesystems, connectors, and state backends.
     
     ## Objective
    +The primary goals of the Flink Kerberos security infrastructure are:
    +1. to enable secure data access for jobs within a cluster via connectors 
(e.g. Kafka)
    +2. to authenticate to ZooKeeper (if configured to use SASL)
    +3. to authenticate to Hadoop components (e.g. HDFS, HBase) 
     
    -The primary goal of Flink security model is to enable secure data access 
for jobs within a cluster via connectors. In a production deployment scenario, 
    -streaming jobs are understood to run for longer period of time 
(days/weeks/months) and the system must be  able to authenticate against secure 
    -data sources throughout the life of the job. The current implementation 
supports running Flink clusters (Job Manager/Task Manager/Jobs) under the 
    -context of a Kerberos identity based on Keytab credential supplied during 
deployment time. Any jobs submitted will continue to run in the identity of the 
cluster.
    +In a production deployment scenario, streaming jobs are understood to run 
for long periods of time (days/weeks/months) and be able to authenticate to 
secure 
    +data sources throughout the life of the job.  Kerberos keytabs do not 
expire in that timeframe, unlike a Hadoop delegation token
    +or ticket cache entry.
    +
    +The current implementation supports running Flink clusters (Job 
Manager/Task Manager/jobs) with either a configured keytab credential
    +or with Hadoop delegation tokens.   Keep in mind that all jobs share the 
credential configured for a given cluster.
     
     ## How Flink Security works
    -Flink deployment includes running Job Manager/ZooKeeper, Task Manager(s), 
Web UI and Job(s). Jobs (user code) can be submitted through web UI and/or CLI. 
    -A Job program may use one or more connectors (Kafka, HDFS, Cassandra, 
Flume, Kinesis etc.,) and each connector may have a specific security 
    -requirements (Kerberos, database based, SSL/TLS, custom etc.,). While 
satisfying the security requirements for all the connectors evolves over a 
period 
    -of time, at this time of writing, the following connectors/services are 
tested for Kerberos/Keytab based security.
    +In concept, a Flink program may use first- or third-party connectors 
(Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary 
authentication methods (Kerberos, SSL/TLS, username/password, etc.).  While 
satisfying the security requirements for all connectors is an ongoing effort,
    +Flink provides first-class support for Kerberos authentication only.  The 
following services and connectors are tested for Kerberos authentication:
     
    -- Kafka (0.9)
    +- Kafka (0.9+)
     - HDFS
    +- HBase
     - ZooKeeper
     
    -Hadoop uses the UserGroupInformation (UGI) class to manage security. UGI 
is a static implementation that takes care of handling Kerberos authentication. 
The Flink bootstrap implementation
    -(JM/TM/CLI) takes care of instantiating UGI with the appropriate security 
credentials to establish the necessary security context.
    +Note that it is possible to enable the use of Kerberos independently for 
each service or connector.  For example, the user may enable 
    +Hadoop security without necessitating the use of Kerberos for ZooKeeper, 
or vice versa.    The shared element is the configuration of 
    +Kerbreros credentials, which is then explicitly used by each component.
    +
    +The internal architecture is based on security modules (implementing 
`org.apache.flink.runtime.security.modules.SecurityModule`) which
    +are installed at startup.  The next section describes each security module.
    +
    +### Hadoop Security Module
    +This module uses the Hadoop `UserGroupInformation` (UGI) class to 
establish a process-wide *login user* context.   The login user is
    +then used for all interactions with Hadoop, including HDFS, HBase, and 
YARN.
    +
    +If Hadoop security is enabled (in `core-site.xml`), the login user will 
have whatever Kerberos credential is configured.  Otherwise,
    +the login user conveys only the user identity of the OS account that 
launched the cluster.
    +
    +### JAAS Security Module
    +This module provides a dynamic JAAS configuration to the cluster, making 
available the configured Kerberos credential to ZooKeeper,
    +Kafka, and other such components that rely on JAAS.
    +
    +Note that the user may also provide a static JAAS configuration file using 
the mechanisms described in the [Java SE 
Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html).
   Static entries override any
    +dynamic entries provided by this module.
    +
    +### ZooKeeper Security Module
    +This module configures certain process-wide ZooKeeper security-related 
settings, namely the ZooKeeper service name (default: `zookeeper`)
    +and the JAAS login context name (default: `Client`).
    +
    +## Security Configuration
    +
    +### Flink Configuration
    +The user's Kerberos ticket cache (managed with `kinit`) is used 
automatically, based on the following configuration option:
    +
    +- `security.kerberos.login.use-ticket-cache`: Indicates whether to read 
from the user's Kerberos ticket cache (default: `true`).
    +
    +A Kerberos keytab can be supplied by adding below configuration elements 
to the Flink configuration file:
    +
    +- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab 
file that contains the user credentials.
    +
    +- `security.kerberos.login.principal`: Kerberos principal name associated 
with the keytab.
    +
    +These configuration options establish a cluster-wide credential to be used 
in a Hadoop and/or JAAS context.  Whether the credential is used in a Hadoop 
context is based on the Hadoop configuration (see next section).   To be used 
in a JAAS context, the configuration specifies which JAAS *login contexts* (or 
*applications*) are enabled with the following configuration option:
    +
    +- `security.kerberos.login.contexts`: A comma-separated list of login 
contexts to provide the Kerberos credentials to (for example, `Client` to use 
the credentials for ZooKeeper authentication).
     
    -Services like Kafka and ZooKeeper use SASL/JAAS based authentication 
mechanism to authenticate against a Kerberos server. It expects JAAS 
configuration with a platform-specific login 
    -module *name* to be provided. Managing per-connector configuration files 
will be an overhead and to overcome this requirement, a process-wide JAAS 
configuration object is 
    -instantiated which serves standard ApplicationConfigurationEntry for the 
connectors that authenticates using SASL/JAAS mechanism.
    +ZooKeeper-related configuration overrides:
     
    -It is important to understand that the Flink processes (JM/TM/UI/Jobs) 
itself uses UGI's doAS() implementation to run under a specific user context, 
i.e. if Hadoop security is enabled 
    -then the Flink processes will be running under a secure user account or 
else it will run as the OS login user account who starts the Flink cluster.
    +- `zookeeper.sasl.service-name`: The Kerberos service name that the 
ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates 
mutual-authentication between the client (Flink) and server.
     
    -## Security Configurations
    +- `zookeeper.sasl.login-context-name`: The JAAS login context name that 
the ZooKeeper client uses to request the login context (default: `Client`). 
Should match
    +one of the values specified in `security.kerberos.login.contexts`.
     
    -Secure credentials can be supplied by adding below configuration elements 
to Flink configuration file:
    +### Hadoop Configuration
     
    -- `security.keytab`: Absolute path to Kerberos keytab file that contains 
the user credentials/secret.
    +The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment 
variable and by other means (see 
`org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`).   The Kerberos 
credential (configured above) is used automatically if Hadoop security is 
enabled.
     
    -- `security.principal`: User principal name that the Flink cluster should 
run as.
    +Note that Kerberos credentials found in the ticket cache aren't 
transferrable to other hosts.   In this scenario, the Flink CLI acquires Hadoop
    +delegation tokens (for HDFS and for HBase).
     
    -The delegation token mechanism (*kinit cache*) is still supported for 
backward compatibility but enabling security using *keytab* configuration is 
the preferred and recommended approach.
    +## Deployment Modes
    +Here is some information specific to each deployment mode.
     
    -## Standalone Mode:
    +### Standalone Mode
     
     Steps to run a secure Flink cluster in standalone/cluster mode:
    -- Add security configurations to Flink configuration file (on all cluster 
nodes) 
    -- Make sure the Keytab file exist in the path as indicated in 
*security.keytab* configuration on all cluster nodes
    -- Deploy Flink cluster using cluster start/stop scripts or CLI
    +1. Add security-related configuration options to the Flink configuration 
file (on all cluster nodes).
    +2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on all cluster nodes.
    +3. Deploy Flink cluster as normal.
     
    -## Yarn Mode:
    +### YARN/Mesos Mode
     
    -Steps to run secure Flink cluster in Yarn mode:
    -- Add security configurations to Flink configuration file (on the node 
from where cluster will be provisioned using Flink/Yarn CLI) 
    -- Make sure the Keytab file exist in the path as indicated in 
*security.keytab* configuration
    -- Deploy Flink cluster using CLI
    +Steps to run a secure Flink cluster in YARN/Mesos mode:
    +1. Add security-related configuration options to the Flink configuration 
file on the client.
    +2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on the client node.
    +3. Deploy Flink cluster as normal.
     
    -In Yarn mode, the user supplied keytab will be copied over to the Yarn 
containers (App Master/JM and TM) as the Yarn local resource file.
    -Security implementation details are based on <a 
href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md";>Yarn
 security</a> 
    +In YARN/Mesos mode, the keytab is automatically copied from the client to 
the Flink containers.
     
    -## Token Renewal
    +For more information, see <a 
href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md";>YARN
 security</a> documentation.
     
    -UGI and Kafka/ZK login module implementations takes care of auto-renewing 
the tickets upon reaching expiry and no further action is needed on the part of 
Flink.
    \ No newline at end of file
    +## Further Details
    +### Ticket Renewal
    +Each component that uses Kerberos is independently responsible for 
renewing the Kerberos TGT.   Hadoop, ZooKeeper, and Kafka all do so, 
    --- End diff --
    
    I would add a sentence here that this requires specifying a keytab to work. 
Hadoop credentials will expire is only ticket cache / delegation tokens are 
used.


> Rework JAAS configuration to support user-supplied entries
> ----------------------------------------------------------
>
>                 Key: FLINK-5364
>                 URL: https://issues.apache.org/jira/browse/FLINK-5364
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management
>            Reporter: Eron Wright 
>            Assignee: Eron Wright 
>            Priority: Critical
>              Labels: kerberos, security
>
> Recent issues (see linked) have brought to light a critical deficiency in the 
> handling of JAAS configuration.   
> 1. the MapR distribution relies on an explicit JAAS conf, rather than 
> in-memory conf used by stock Hadoop.
> 2. the ZK/Kafka/Hadoop security configuration is supposed to be independent 
> (one can enable each element separately) but isn't.
> Perhaps we should rework the JAAS conf code to merge any user-supplied 
> configuration with our defaults, rather than using an all-or-nothing 
> approach.   
> We should also address some recent regressions:
> 1. The HadoopSecurityContext should be installed regardless of auth mode, to 
> login with UserGroupInformation, which:
> - handles the HADOOP_USER_NAME variable.
> - installs an OS-specific user principal (from UnixLoginModule etc.) 
> unrelated to Kerberos.
> - picks up the HDFS/HBASE delegation tokens.
> 2. Fix the use of alternative authentication methods - delegation tokens and 
> Kerberos ticket cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5364) Rework JAAS configuration to support user-supplied entries

Reply via email to