[
https://issues.apache.org/jira/browse/FLINK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404005#comment-15404005
]
ASF GitHub Bot commented on FLINK-3929:
---------------------------------------
Github user mxm commented on a diff in the pull request:
https://github.com/apache/flink/pull/2275#discussion_r73159333
--- Diff: docs/internals/flink_security.md ---
@@ -0,0 +1,87 @@
+---
+title: "Flink Security"
+# Top navigation
+top-nav-group: internals
+top-nav-pos: 10
+top-nav-title: Flink Security
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document briefly describes how Flink security works in the context of
various deployment mechanism (Standalone/Cluster vs YARN)
+and the connectors that participates in Flink Job execution stage. This
documentation can be helpful for both administrators and developers
+who plans to run Flink on a secure environment.
+
+## Objective
+
+The primary goal of Flink security model is to enable secure data access
for jobs within a cluster via connectors. In production deployment scenario,
+streaming jobs are understood to run for longer period of time
(days/weeks/months) and the system must be able to authenticate against secure
+data sources throughout the life of the job. The current implementation
supports running Flink cluster (Job Manager/Task Manager/Jobs) under the
+context of a Kerberos identity based on Keytab credential supplied during
deployment time. Any jobs submitted will continue to run in the identity of the
cluster.
+
+## How Flink Security works
+Flink deployment includes running Job Manager/ZooKeeper, Task Manager(s),
Web UI and Job(s). Jobs (user code) can be submitted through web UI and/or CLI.
+A Job program may use one or more connectors (Kafka, HDFS, Cassandra,
Flume, Kinesis etc.,) and each connector may have a specific security
+requirements (Kerberos, database based, SSL/TLS, custom etc.,). While
satisfying the security requirements for all the connectors evolve over a
period
+of time but at this time of writing, the following connectors/services are
tested for Kerberos/Keytab based security.
+
+- Kafka (0.9)
+- HDFS
+- ZooKeeper
+
+Hadoop uses UserGroupInformation (UGI) class to manage security. UGI is a
static implementation that takes care of handling Kerberos authentication.
Flink bootstrap implementation
+(JM/TM/CLI) takes care of instantiating UGI with appropriate security
credentials to establish necessary security context.
+
+Services like Kafka and ZooKeeper uses SASL/JAAS based authentication
mechanism to authenticate against a Kerberos server. It expects JAAS
configuration with platform-specific login
+module *name* to be provided. Managing per-connector configuration files
will be an overhead and to overcome this requirement, a process-wide JAAS
configuration object is
+instantiated which serves standard ApplicationConfigurationEntry for the
connectors that authenticates using SASL/JAAS mechanism.
+
+It is important to understand that the Flink processes (JM/TM/UI/Jobs)
itself uses UGI's doAS() implementation to run under specific user context
i.e., if Hadoop security is enabled
+then the Flink processes will be running under secure user account or else
it will run as the OS login user account who starts Flink cluster.
+
+## Security Configurations
+
+Secure credentials can be supplied by adding below configuration elements
to Flink configuration file:
+
+- `security.keytab`: Absolute path to Kerberos keytab file that contains
the user credentials/secret.
+
+- `security.principal`: User principal name that the Flink cluster should
run as.
--- End diff --
Is the keytab file enough then? Do we need an additional principal?
> Support for Kerberos Authentication with Keytab Credential
> ----------------------------------------------------------
>
> Key: FLINK-3929
> URL: https://issues.apache.org/jira/browse/FLINK-3929
> Project: Flink
> Issue Type: New Feature
> Reporter: Eron Wright
> Assignee: Vijay Srinivasaraghavan
> Labels: kerberos, security
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> _This issue is part of a series of improvements detailed in the [Secure Data
> Access|https://docs.google.com/document/d/1-GQB6uVOyoaXGwtqwqLV8BHDxWiMO2WnVzBoJ8oPaAs/edit?usp=sharing]
> design doc._
> Add support for a keytab credential to be associated with the Flink cluster,
> to facilitate:
> - Kerberos-authenticated data access for connectors
> - Kerberos-authenticated ZooKeeper access
> Support both the standalone and YARN deployment modes.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)