This is an automated email from the ASF dual-hosted git repository.
alexey pushed a commit to branch branch-1.12.x
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/branch-1.12.x by this push:
new d3aec31 docs: add Ranger integration
d3aec31 is described below
commit d3aec314ccc82a23087d2f9510285ec494825638
Author: hahao <[email protected]>
AuthorDate: Sun May 10 22:19:59 2020 -0700
docs: add Ranger integration
Staged version here:
https://github.com/haohaoc/kudu/blob/ranger-docs/docs/security.adoc
Change-Id: Iad9476f18267c1e14a73f893fd812674c955eee2
Reviewed-on: http://gerrit.cloudera.org:8080/15897
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <[email protected]>
(cherry picked from commit a961fbc7cac61737d03a0c9cf8898199a101f67e)
Reviewed-on: http://gerrit.cloudera.org:8080/15910
---
docs/kudu_impala_integration.adoc | 1 +
docs/security.adoc | 234 +++++++++++++++++++++++++++++++++++---
2 files changed, 218 insertions(+), 17 deletions(-)
diff --git a/docs/kudu_impala_integration.adoc
b/docs/kudu_impala_integration.adoc
index 1c7d923..2a0e3ab 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -249,6 +249,7 @@ STORED AS KUDU;
If you have multiple primary key columns, you can specify partition bounds
using tuple syntax: `('va',1), ('ab',2)`. The expression must be valid JSON.
+[[managed_tables]]
==== Impala Databases and Kudu
Every Impala table is contained within a namespace called a _database_. The
default
diff --git a/docs/security.adoc b/docs/security.adoc
index 27842e9..53c58d7 100644
--- a/docs/security.adoc
+++ b/docs/security.adoc
@@ -156,8 +156,14 @@ to access the cluster.
As of Kudu 1.10.0, Kudu can be configured to enforce fine-grained authorization
across servers. This ensures that users can see only the data they are
-explicitly authorized to see. Kudu currently supports this by leveraging
-policies defined in Apache Sentry 2.2 and later.
+explicitly authorized to see. Kudu supports this by leveraging policies
+defined in Apache Sentry 2.2 and later. In addition, starting from Kudu
+1.12.0, Kudu can support fine-grained authorization by leveraging policies
+defined in Apache Ranger 2.1 and later.
+
+WARNING: Since support for Apache Sentry authorization has been deprecated
since
+Kudu 1.12.0 and may be completely removed, fine-grained authorization via
Apache
+Ranger is preferred going forward.
WARNING: Fine-grained authorization policies are not enforced when accessing
the web UI. User data may appear on various pages of the web UI (e.g. in logs,
@@ -165,6 +171,66 @@ metrics, scans, etc.). As such, it is recommended to
either limit access to the
web UI ports, or redact or disable the web UI entirely, as desired. See the
<<web-ui,instructions for securing the web UI>> for more details.
+=== Apache Ranger
+
+Apache Ranger models tabular objects stored in a Kudu cluster in the following
+hierarchy:
+
+NOTE: Ranger allows you to add separate service repositories to manage
privileges
+for different Kudu clusters. Depending on the value of the
`ranger.plugin.kudu.service.name`
+configuration in Ranger client, Kudu knows which service repository to connect
+to. For more details about Ranger service repository, see the Apache Ranger
+link:https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=57901344[documentation].
+
+* *Database* - Kudu does not have the concept of a database. Therefore, a
database
+is indicated as a prefix of table names with the format `<database>.<table>`.
+Since Kudu's only restriction on table names is that they be valid UTF-8
encoded
+strings, Kudu considers special characters to be valid parts of database or
table
+names. For example, if a managed Kudu table created from Impala (see Kudu
Impala
+integration <<kudu_impala_integration.adoc#managed_tables,documentation>>) is
named
+`impala::bar.foo`, its database will be `impala::bar`.
+
+* *Table* - a single Kudu table.
+
+* *Column* - a column within a Kudu table.
+
+In Ranger, privileges are also associated with specific actions. Access to Kudu
+tables may rely on privileges on the following actions:
+
+* `ALTER`
+* `CREATE`
+* `DELETE`
+* `DROP`
+* `INSERT`
+* `UPDATE`
+* `SELECT`
+* `ALL`
+* `METADATA`
+
+Specifically, if a user has the `ALL` privileges on a given table, that user
has
+all of the above privileges on the table. `METADATA` privilege is modeled as
any
+privilege. If a user has any privilege on a given table, that user has
`METADATA`
+privileges on the table, i.e. a privilege granted on any action on a table
implies
+that the user has the `METADATA` privilege on that table.
+
+In term of privilege evaluation Ranger doesn't have the concept of hierarchical
+implication. To be more specific, if a user has `SELECT` privilege on a
database,
+it does not imply that user has `SELECT` privileges on every table belonging to
+that database. On the other hand, Ranger supports privilege wildcard matching.
+For example, `db=a->table=\*` matches all the tables that belong to database
`a`.
+Therefore, in Ranger users actually need the `SELECT` privilege on
+`db=a->table=*->column=*` to match the semantics of the `SELECT` privilege on
+`db=a` in Sentry.
+
+Nevertheless, with Ranger integration, when a Kudu master receives a request,
+it consults Ranger to determine what privileges a user has. And the required
+policies documented in the <<security.adoc#policy-for-kudu-masters, policy
section>>
+are enforced to determine whether the user is authorized to perform the
requested
+action or not.
+
+NOTE: Even though Kudu table names remain case sensitive with Ranger
integration,
+policies authorization is considered case-insensitive.
+
=== Apache Sentry
Apache Sentry models tabular objects in the following hierarchy:
@@ -220,19 +286,19 @@ user to decide whether to perform or reject a request.
=== Authorization Tokens
-Rather than having every tablet server communicate directly with Sentry,
-privileges are propagated and checked via *authorization tokens*. These tokens
-encapsulate what privileges a user has on a given table. Tokens are generated
-by the master and returned to Kudu clients upon opening a Kudu table. Kudu
-clients automatically attach authorization tokens when sending requests to
-tablet servers.
+Rather than having every tablet server communicate directly with the underlying
+authorization service (e.g. Sentry or Ranger), privileges are propagated and
checked
+via *authorization tokens*. These tokens encapsulate what privileges a user has
+on a given table. Tokens are generated by the master and returned to Kudu
clients
+upon opening a Kudu table. Kudu clients automatically attach authorization
tokens
+when sending requests to tablet servers.
NOTE: Authorization tokens are a means to limiting the number of nodes directly
-accessing Sentry to retrieve privileges. As such, since the expected number of
-tablet servers in a cluster is much higher than the number of Kudu masters,
-they are only used to authorize requests sent to tablet servers. Kudu masters
-fetch privileges directly from Sentry or cache. See <<privilege-caching>> for
-more details of Kudu's privilege cache.
+accessing the authorization service to retrieve privileges. As such, since the
+expected number of tablet servers in a cluster is much higher than the number
of
+Kudu masters, they are only used to authorize requests sent to tablet servers.
+Kudu masters fetch privileges directly from the authorization service or cache.
+See <<privilege-caching>> for more details of Kudu's privilege cache.
Similar to the validity interval for authentication tokens, to limit the
window of potential unwanted access if a token becomes compromised,
@@ -251,14 +317,14 @@ operation, or if it is invalid (e.g. expired).
It may be desirable to allow certain users to view and modify any data stored
in Kudu. Such users can be specified via the `--trusted_user_acl` master
configuration. Trusted users can perform any operation that would otherwise
-require fine-grained privileges, without Kudu consulting Sentry.
+require fine-grained privileges, without Kudu consulting the authorization
service.
Additionally, some services that interact with Kudu may authorize requests on
behalf of their end users. For example, Apache Impala authorizes queries on
behalf of its users, and sends requests to Kudu as the Impala service user,
commonly "impala". Since Impala authorizes requests on its own, to avoid
-extraneous communication between Sentry and Kudu, the Impala service user
-should be listed as a trusted user.
+extraneous communication between the authorization service and Kudu, the
+Impala service user should be listed as a trusted user.
NOTE: When accessing Kudu through Impala, Impala enforces its own fine-grained
authorization policy. This policy is similar to Kudu's and can be found in
@@ -266,6 +332,128 @@ Impala's
link:https://impala.apache.org/docs/build/html/topics/impala_authorization.html#authorization[authorization
documentation].
+=== Configuring the Integration with Apache Ranger
+
+NOTE: Ranger is often configured with Kerberos authentication. See
+<<configuration>> for how to configure Kudu to authenticate via Kerberos.
+
+NOTE: Sentry integration can not be enabled at the same time with Ranger
+integration.
+
+* After building Kudu from source, find the `kudu-subprocess.jar` under the
build
+directory (e.g. `build/release/bin`). Note its path, as it is the one to the
+JAR file containing the Ranger subprocess, which houses the Ranger client that
+Kudu will use to communicate with the Ranger server.
+
+* Use the `kudu table list` tool to find any table names in the cluster that
are
+not Ranger-compatible, which are names that begin or end with a period. Also
check
+that there are no two table names that only differ by case, since authorization
+is case-insensitive. For those tables that don't comply with the requirements,
+use the `kudu table rename_table` tool to rename the tables.
+
+* Create Ranger client `ranger-kudu-security.xml` configuration file, and note
down
+the directory containing this file.
+
+```xml
+<property>
+ <name>ranger.plugin.kudu.policy.cache.dir</name>
+ <value>policycache</value>
+ <description>Directory where Ranger policies are cached after successful
retrieval from the Ranger service</description>
+</property>
+<property>
+ <name>ranger.plugin.kudu.service.name</name>
+ <value>kudu</value>
+ <description>Name of the Ranger service repository storing policies for this
Kudu cluster</description>
+</property>
+<property>
+ <name>ranger.plugin.kudu.policy.rest.url</name>
+ <value>http://host:port</value>
+ <description>Ranger Admin URL</description>
+</property>
+<property>
+ <name>ranger.plugin.kudu.policy.source.impl</name>
+ <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
+ <description>Ranger client implementation to retrieve policies from the
Ranger service</description>
+</property>
+<property>
+ <name>ranger.plugin.kudu.policy.rest.ssl.config.file</name>
+ <value>ranger-kudu-policymgr-ssl.xml</value>
+ <description>Path to the file containing SSL details to connect Ranger
Admin</description>
+</property>
+<property>
+ <name>ranger.plugin.kudu.policy.pollIntervalMs</name>
+ <value>30000</value>
+ <description>Ranger client policy polling interval</description>
+</property>
+```
+
+* When Secure Socket Layer (SSL) is enabled for Ranger Admin, add
`ranger-kudu-policymgr-ssl.xml`
+file to the Ranger client configuration directory with the following
configurations:
+
+```xml
+<property>
+ <name>xasecure.policymgr.clientssl.keystore</name>
+ <value>[/path/to/keystore].jks</value>
+ <description>Java keystore files</description>
+</property>
+<property>
+ <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
+ <value>jceks://file/[path/to/credentials].jceks</value>
+ <description>Java keystore credential file</description>
+</property>
+<property>
+ <name>xasecure.policymgr.clientssl.truststore</name>
+ <value>[/path/to/truststore].jks</value>
+ <description>Java truststore file</description>
+</property>
+<property>
+ <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
+ <value>jceks://file/[path/to/credentials].jceks</value>
+ <description>Java truststore credential file</description>
+</property>
+```
+
+* Set the following configurations on the Kudu master:
+
+```
+# The path to directory containing Ranger client configuration. This example
+# assumes the path is '/kudu/ranger-config'.
+--ranger_config_path=/kudu/ranger-config
+
+# The path where the Java binary was installed. This example assumes
+# '$JAVA_HOME=/usr/local'
+--ranger_java_path=/usr/local/bin/java
+
+# The path to the JAR file containing the Ranger subprocess. This example
+# assumes '$KUDU_HOME=/kudu'
+--ranger_jar_path=/kudu/build/release/bin/kudu-subprocess.jar
+
+# This example ACL setup allows the 'impala' user to access all data stored in
+# Kudu, assuming Impala will authorize requests on its own. The 'kudu' user is
+# also granted access to all Kudu data, which may facilitate testing and
+# debugging (such as running the 'kudu cluster ksck' tool).
+--trusted_user_acl=impala,kudu
+```
+
+* Set the following configurations on the tablet servers:
+
+```
+--tserver_enforce_access_control=true
+```
+
+* Add a Kudu service repository with the following configurations via the
Ranger
+Admin web UI:
+
+```xml
+# This example setup configures the Kudu service user as a privileged user to
be
+# able to retrieve authorization policies stored in Ranger.
+
+<property>
+ <name>policy.download.auth.users</name>
+ <value>kudu</value>
+</property>
+```
+
[[sentry-configuration]]
=== Configuring the Integration with Apache Sentry
@@ -314,8 +502,9 @@ The following configurations must be set in
`sentry-site.xml` on the Sentry serv
<value>kudu</value>
</property>
```
+
[[privilege-caching]]
-=== Caching
+=== Kudu Master Caching for Sentry
To avoid overwhelming Sentry with requests to fetch user privileges, the Kudu
master can be configured to cache user privileges. A by-product of this caching
@@ -334,6 +523,17 @@ cached privileges to force Kudu to fetch new ones from
Sentry:
kudu master authz_cache reset <master-addresses>
----
+=== Ranger Client Caching
+On the other hand, privilege cache in Kudu master is disabled with Ranger
integration,
+since Ranger provides client side cache the use privileges and can
periodically poll
+the privilege store for any changes. When a change is detected, the cache will
be
+automatically updated.
+
+NOTE: Update the `ranger.plugin.kudu.policy.pollIntervalMs` property specified
in
+`ranger-kudu-security.xml` to set how often the Ranger client cache refreshes
+the privileges from the Ranger service.
+
+[[policy-for-kudu-masters]]
=== Policy for Kudu Masters
The following authorization policy is enforced by Kudu masters.