http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_authorization.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_authorization.xml b/docs/topics/impala_authorization.xml new file mode 100644 index 0000000..5163b2f --- /dev/null +++ b/docs/topics/impala_authorization.xml @@ -0,0 +1,1621 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept rev="1.1" id="authorization"> + + <title>Enabling Sentry Authorization for Impala</title> + <prolog> + <metadata> + <data name="Category" value="Security"/> + <data name="Category" value="Sentry"/> + <data name="Category" value="Impala"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Starting and Stopping"/> + <data name="Category" value="Users"/> + <data name="Category" value="Groups"/> + <data name="Category" value="Administrators"/> + </metadata> + </prolog> + + <conbody id="sentry"> + + <p> + Authorization determines which users are allowed to access which resources, and what operations they are + allowed to perform. In Impala 1.1 and higher, you use the Sentry open source project for authorization. + Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization is not + enabled), Impala does all read and write operations with the privileges of the <codeph>impala</codeph> user, + which is suitable for a development/test environment but not for a secure production environment. When + authorization is enabled, Impala uses the OS user ID of the user who runs <cmdname>impala-shell</cmdname> or + other client program, and associates various privileges with each user. + </p> + + <note> + Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed + to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being + able to connect by creating a named account on an untrusted machine. See + <xref href="impala_kerberos.xml#kerberos"/> for details about Kerberos authentication. + </note> + + <p audience="PDF" outputclass="toc inpage"> + See the following sections for details about using the Impala authorization features: + </p> + </conbody> + + <concept id="sentry_priv_model"> + + <title>The Sentry Privilege Model</title> + + <conbody> + + <p> + Privileges can be granted on different objects in the schema. Any privilege that can be granted is + associated with a level in the object hierarchy. If a privilege is granted on a container object in the + hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other + database systems such as MySQL. + </p> + + <p rev="2.3.0 collevelauth"> + The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well; + anywhere you specify a table name, you can specify a view name instead.) + Column-level authorization is available in <keyword keyref="impala23_full"/> and higher, as described in + <xref audience="integrated" href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref audience="standalone" href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html" format="html" scope="external"/>. + Previously, you constructed views to query specific columns and assigned privileges + based on the views rather than the base tables. + </p> + + <p> + A restricted set of privileges determines what you can do with each object: + </p> + + <dl> + <dlentry id="select_priv"> + + <dt> + SELECT privilege + </dt> + + <dd> + Lets you read data from a table or view, for example with the <codeph>SELECT</codeph> statement, the + <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE TABLE...LIKE</codeph>. Also required to + issue the <codeph>DESCRIBE</codeph> statement or the <codeph>EXPLAIN</codeph> statement for a query + against a particular table. Only objects for which a user has this privilege are shown in the output + for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW TABLES</codeph> statements. The + <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE METADATA</codeph> statements only access + metadata for tables for which the user has this privilege. + </dd> + + </dlentry> + + <dlentry id="insert_priv"> + + <dt> + INSERT privilege + </dt> + + <dd> + Lets you write data to a table. Applies to the <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> + statements. + </dd> + + </dlentry> + + <dlentry id="all_priv"> + + <dt> + ALL privilege + </dt> + + <dd> + Lets you create or modify the object. Required to run DDL statements such as <codeph>CREATE + TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP TABLE</codeph> for a table, + <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> for a database, or <codeph>CREATE + VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP VIEW</codeph> for a view. Also required for + the URI of the <q>location</q> parameter for the <codeph>CREATE EXTERNAL TABLE</codeph> and + <codeph>LOAD DATA</codeph> statements. +<!-- Have to think about the best wording, how often to repeat, how best to conref this caveat. + You do not actually code the keyword <codeph>ALL</codeph> in the policy file; instead you use + <codeph>action=*</codeph> or shorten the right-hand portion of the rule. + --> + </dd> + + </dlentry> + </dl> + + <p> + Privileges can be specified for a table or view before that object actually exists. If you do not have + sufficient privilege to perform an operation, the error message does not disclose if the object exists or + not. + </p> + + <p> + Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an + option, but the emphasis of privilege management is moving towards being SQL-based. Although currently + Impala does not have <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> statements, Impala can make use of + privileges assigned through <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements done through + Hive. The mode of operation with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements instead of + the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and + manipulates privilege information stored inside the metastore database. + </p> + </conbody> + </concept> + + <concept id="secure_startup"> + + <title>Starting the impalad Daemon with Sentry Authorization Enabled</title> + <prolog> + <metadata> + <data name="Category" value="Starting and Stopping"/> + </metadata> + </prolog> + + <conbody> + + <p> + To run the <cmdname>impalad</cmdname> daemon with authorization enabled, you add one or more options to the + <codeph>IMPALA_SERVER_ARGS</codeph> declaration in the <filepath>/etc/default/impala</filepath> + configuration file: + </p> + + <ul> + <li> + The <codeph>-server_name</codeph> option turns on Sentry authorization for Impala. The authorization + rules refer to a symbolic server name, and you specify the name to use as the argument to the + <codeph>-server_name</codeph> option. + </li> + + <li rev="1.4.0"> + If you specify just <codeph>-server_name</codeph>, Impala uses the Sentry service for authorization, + relying on the results of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through + Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you + want to continue storing privilege rules in the policy file, also specify the + <codeph>-authorization_policy_file</codeph> option as in the following item. + </li> + + <li> + Specifying the <codeph>-authorization_policy_file</codeph> option in addition to + <codeph>-server_name</codeph> makes Impala read privilege information from a policy file, rather than + from the metastore database. The argument to the <codeph>-authorization_policy_file</codeph> option + specifies the HDFS path to the policy file that defines the privileges on different schema objects. + </li> + </ul> + + <p rev="1.4.0"> + For example, you might adapt your <filepath>/etc/default/impala</filepath> configuration to contain lines + like the following. To use the Sentry service rather than the policy file: + </p> + +<codeblock rev="1.4.0">IMPALA_SERVER_ARGS=" \ +-server_name=server1 \ +... +</codeblock> + + <p> + Or to use the policy file, as in releases prior to Impala 1.4: + </p> + +<codeblock>IMPALA_SERVER_ARGS=" \ +-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \ +-server_name=server1 \ +... +</codeblock> + + <p> + The preceding examples set up a symbolic name of <codeph>server1</codeph> to refer to the current instance + of Impala. This symbolic name is used in the following ways: + </p> + + <ul> + <li> + <p> + In an environment managed by Cloudera Manager, the server name is specified through + <menucascade><uicontrol>Impala (Service-Wide)</uicontrol><uicontrol>Category</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry Service</uicontrol></menucascade> and + <menucascade><uicontrol>Hive</uicontrol><uicontrol>Service-Wide</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry Service</uicontrol></menucascade>. + The values must be the same for both, so that Impala and Hive can share the privilege rules. Restart + the Impala and Hive services after setting or changing this value. + </p> + </li> + + <li> + <p> + In an environment not managed by Cloudera Manager, you specify this value for the + <codeph>sentry.hive.server</codeph> property in the <filepath>sentry-site.xml</filepath> configuration + file for Hive, as well as in the <codeph>-server_name</codeph> option for <cmdname>impalad</cmdname>. + </p> + <p> + If the <cmdname>impalad</cmdname> daemon is not already running, start it as described in + <xref href="impala_processes.xml#processes"/>. If it is already running, restart it with the command + <codeph>sudo /etc/init.d/impala-server restart</codeph>. Run the appropriate commands on all the nodes + where <cmdname>impalad</cmdname> normally runs. + </p> + </li> + + <li> + <p> + If you use the mode of operation using the policy file, the rules in the <codeph>[roles]</codeph> + section of the policy file refer to this same <codeph>server1</codeph> name. For example, the following + rule sets up a role <codeph>report_generator</codeph> that lets users with that role query any table in + a database named <codeph>reporting_db</codeph> on a node where the <cmdname>impalad</cmdname> daemon + was started up with the <codeph>-server_name=server1</codeph> option: + </p> +<codeblock>[roles] +report_generator = server=server1->db=reporting_db->table=*->action=SELECT +</codeblock> + </li> + </ul> + + <p> + When <cmdname>impalad</cmdname> is started with one or both of the <codeph>-server_name=server1</codeph> + and <codeph>-authorization_policy_file</codeph> options, Impala authorization is enabled. If Impala detects + any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to + start. + </p> + </conbody> + </concept> + + <concept id="sentry_service"> + + <title>Using Impala with the Sentry Service (<keyword keyref="impala14"/> or higher only)</title> + + <conbody> + + <p> + When you use the Sentry service rather than the policy file, you set up privileges through + <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statement in either Impala or Hive, then both components + use those same privileges automatically. (Impala added the <codeph>GRANT</codeph> and + <codeph>REVOKE</codeph> statements in <keyword keyref="impala20_full"/>.) + </p> + + <p> + Hive already had <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements prior to CDH 5.1, but those + statements were not production-ready. CDH 5.1 is the first release where those statements use the Sentry + framework and are considered GA level. If you used the Hive <codeph>GRANT</codeph> and + <codeph>REVOKE</codeph> statements prior to CDH 5.1, you must set up these privileges with the CDH 5.1 + versions of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> to take advantage of Sentry authorization. + </p> + + <p> + For information about using the updated Hive <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements, + see +<!-- Original URL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Security-Guide/cdh_sg_sentry_service.html --> + <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_sentry_service.html" scope="external" format="html">Sentry + service</xref> topic in the <cite>CDH 5 Security Guide</cite>. + </p> + </conbody> + </concept> + + <concept id="security_policy_file"> + + <title>Using Impala with the Sentry Policy File</title> + + <conbody> + + <p> + The policy file is a file that you put in a designated location in HDFS, and is read during the startup of + the <cmdname>impalad</cmdname> daemon when you specify both the <codeph>-server_name</codeph> and + <codeph>-authorization_policy_file</codeph> startup options. It controls which objects (databases, tables, + and HDFS directory paths) can be accessed by the user who connects to <cmdname>impalad</cmdname>, and what + operations that user can perform on the objects. + </p> + + <note rev="1.4.0"> + <p rev="1.4.0"> + In <ph rev="upstream">CDH 5</ph> and higher, <ph rev="upstream">Cloudera</ph> recommends + managing privileges through SQL statements, as described in + <xref href="impala_authorization.xml#sentry_service"/>. If you are still using policy files, plan to + migrate to the new approach some time in the future. + </p> + </note> + + <p> + The location of the policy file is listed in the <filepath>auth-site.xml</filepath> configuration file. To + minimize overhead, the security information from this file is cached by each <cmdname>impalad</cmdname> + daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change + to security policies, restart all Impala daemons to pick up the changes immediately. + </p> + + <p outputclass="toc inpage"/> + </conbody> + + <concept id="security_policy_file_details"> + + <title>Policy File Location and Format</title> + + <conbody> + + <p> + The policy file uses the familiar <codeph>.ini</codeph> format, divided into the major sections + <codeph>[groups]</codeph> and <codeph>[roles]</codeph>. There is also an optional + <codeph>[databases]</codeph> section, which allows you to specify a specific policy file for a particular + database, as explained in <xref href="#security_multiple_policy_files"/>. Another optional section, + <codeph>[users]</codeph>, allows you to override the OS-level mapping of users to groups; that is an + advanced technique primarily for testing and debugging, and is beyond the scope of this document. + </p> + + <p> + In the <codeph>[groups]</codeph> section, you define various categories of users and select which roles + are associated with each category. The group and usernames correspond to Linux groups and users on the + server where the <cmdname>impalad</cmdname> daemon runs. + </p> + + <p> + The group and usernames in the <codeph>[groups]</codeph> section correspond to Linux groups and users on + the server where the <cmdname>impalad</cmdname> daemon runs. When you access Impala through the + <cmdname>impalad</cmdname> interpreter, for purposes of authorization, the user is the logged-in Linux + user and the groups are the Linux groups that user is a member of. When you access Impala through the + ODBC or JDBC interfaces, the user and password specified through the connection string are used as login + credentials for the Linux server, and authorization is based on that username and the associated Linux + group membership. + </p> + + <p> + In the <codeph>[roles]</codeph> section, you a set of roles. For each role, you specify precisely the set + of privileges is available. That is, which objects users with that role can access, and what operations + they can perform on those objects. This is the lowest-level category of security information; the other + sections in the policy file map the privileges to higher-level divisions of groups and users. In the + <codeph>[groups]</codeph> section, you specify which roles are associated with which groups. The group + and usernames correspond to Linux groups and users on the server where the <cmdname>impalad</cmdname> + daemon runs. The privileges are specified using patterns like: +<codeblock>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=SELECT +server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=CREATE +server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=ALL +</codeblock> + For the <varname>server_name</varname> value, substitute the same symbolic name you specify with the + <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You can use <codeph>*</codeph> wildcard + characters at each level of the privilege specification to allow access to all such objects. For example: +<codeblock>server=impala-host.example.com->db=default->table=t1->action=SELECT +server=impala-host.example.com->db=*->table=*->action=CREATE +server=impala-host.example.com->db=*->table=audit_log->action=SELECT +server=impala-host.example.com->db=default->table=t1->action=* +</codeblock> + </p> + + <p> + When authorization is enabled, Impala uses the policy file as a <i>whitelist</i>, representing every + privilege available to any user on any object. That is, only operations specified for the appropriate + combination of object, role, group, and user are allowed; all other operations are not allowed. If a + group or role is defined multiple times in the policy file, the last definition takes precedence. + </p> + + <p> + To understand the notion of whitelisting, set up a minimal policy file that does not provide any + privileges for any object. When you connect to an Impala node where this policy file is in effect, you + get no results for <codeph>SHOW DATABASES</codeph>, and an error when you issue any <codeph>SHOW + TABLES</codeph>, <codeph>USE <varname>database_name</varname></codeph>, <codeph>DESCRIBE + <varname>table_name</varname></codeph>, <codeph>SELECT</codeph>, and or other statements that expect to + access databases or tables, even if the corresponding databases and tables exist. + </p> + + <p> + The contents of the policy file are cached, to avoid a performance penalty for each query. The policy + file is re-checked by each <cmdname>impalad</cmdname> node every 5 minutes. When you make a + non-time-sensitive change such as adding new privileges or new users, you can let the change take effect + automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect + immediately, restart the <cmdname>impalad</cmdname> daemon on all nodes, again specifying the + <codeph>-server_name</codeph> and <codeph>-authorization_policy_file</codeph> options so that the rules + from the updated policy file are applied. + </p> + </conbody> + </concept> + + <concept id="security_examples"> + + <title>Examples of Policy File Rules for Security Scenarios</title> + + <conbody> + + <p> + The following examples show rules that might go in the policy file to deal with various + authorization-related scenarios. For illustration purposes, this section shows several very small policy + files with only a few rules each. In your environment, typically you would define many roles to cover all + the scenarios involving your own databases, tables, and applications, and a smaller number of groups, + whose members are given the privileges from one or more roles. + </p> + + <example id="sec_ex_unprivileged"> + + <title>A User with No Privileges</title> + + <p> + If a user has no privileges at all, that user cannot access any schema objects in the system. The error + messages do not disclose the names or existence of objects that the user is not authorized to read. + </p> + + <p> +<!-- This example demonstrates the lack of privileges using a blank policy file, so no users have any privileges. --> + This is the experience you want a user to have if they somehow log into a system where they are not an + authorized Impala user. In a real deployment with a filled-in policy file, a user might have no + privileges because they are not a member of any of the relevant groups mentioned in the policy file. + </p> + +<!-- Have the raw material but not formatted into easily digestible example. Do for first 1.1 doc refresh. +<codeblock></codeblock> --> + + </example> + + <example id="sec_ex_superuser"> + + <title>Examples of Privileges for Administrative Users</title> + + <p> + When an administrative user has broad access to tables or databases, the associated rules in the + <codeph>[roles]</codeph> section typically use wildcards and/or inheritance. For example, in the + following sample policy file, <codeph>db=*</codeph> refers to all databases and + <codeph>db=*->table=*</codeph> refers to all tables in all databases. + </p> + + <p> + Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could + be specified there. For example, in the following sample policy file, the + <codeph>all_databases</codeph> role has all privileges for all tables in all databases, while the + <codeph>one_database</codeph> role has all privileges for all tables in one specific database. The + <codeph>all_databases</codeph> role does not grant privileges on URIs, so a group with that role could + not issue a <codeph>CREATE TABLE</codeph> statement with a <codeph>LOCATION</codeph> clause. The + <codeph>entire_server</codeph> role has all privileges on both databases and URIs within the server. + </p> + +<codeblock>[groups] +supergroup = all_databases + +[roles] +read_all_tables = server=server1->db=*->table=*->action=SELECT +all_tables = server=server1->db=*->table=* +all_databases = server=server1->db=* +one_database = server=server1->db=test_db +entire_server = server=server1 +</codeblock> + + </example> + + <example id="sec_ex_detailed"> + + <title>A User with Privileges for Specific Databases and Tables</title> + + <p> + If a user has privileges for specific tables in specific databases, the user can access those things + but nothing else. They can see the tables and their parent databases in the output of <codeph>SHOW + TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, <codeph>USE</codeph> the appropriate databases, + and perform the relevant actions (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on the + table privileges. To actually create a table requires the <codeph>ALL</codeph> privilege at the + database level, so you might define separate roles for the user that sets up a schema and other users + or applications that perform day-to-day operations on the tables. + </p> + + <p> + The following sample policy file shows some of the syntax that is appropriate as the policy file grows, + such as the <codeph>#</codeph> comment syntax, <codeph>\</codeph> continuation syntax, and comma + separation for roles assigned to groups or privileges assigned to roles. + </p> + +<codeblock>[groups] +cloudera = training_sysadmin, instructor +visitor = student + +[roles] +training_sysadmin = server=server1->db=training, \ +server=server1->db=instructor_private, \ +server=server1->db=lesson_development +instructor = server=server1->db=training->table=*->action=*, \ +server=server1->db=instructor_private->table=*->action=*, \ +server=server1->db=lesson_development->table=lesson* +# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP. +student = server=server1->db=training->table=lesson_*->action=SELECT +</codeblock> + + </example> + +<!-- +<example id="sec_ex_superuser_single_db"> +<title>A User with Full Privileges for a Specific Database</title> +<p> + +</p> +<codeblock></codeblock> +</example> + +<example id="sec_ex_readonly_single_db"> +<title>A User with Read-Only Privileges for a Specific Database</title> +<p> + +</p> +<codeblock></codeblock> + <p> + If a user has <codeph>SELECT</codeph> privilege for a database, they can issue a <codeph>USE</codeph> statement + for that database. Whether or not they can access tables within the database depends on further privileges + defined at the table level. + </p> + +<codeblock></codeblock> + +</example> + +<example id="sec_ex_superuser_single_table"> +<title>A User with Full Privileges for a Specific Table</title> + <p> + If a user has <codeph>SELECT</codeph> privilege for a table, they can query, describe, or explain queries for + that table. + </p> + +<codeblock></codeblock> +</example> + +<example id="sec_ex_load_data"> +<title>A User with Privileges to Load Data but not Read Data</title> + + <p> + If a user has <codeph>INSERT</codeph> privilege for a table, they can write to the table if it already exists. + They cannot create or alter the table; those operations require the <codeph>ALL</codeph> privilege. + </p> +<codeblock></codeblock> +</example> +--> + + <example id="sec_ex_external_files"> + + <title>Privileges for Working with External Data Files</title> + + <p> + When data is being inserted through the <codeph>LOAD DATA</codeph> statement, or is referenced from an + HDFS location outside the normal Impala database directories, the user also needs appropriate + permissions on the URIs corresponding to those HDFS locations. + </p> + + <p> + In this sample policy file: + </p> + + <ul> + <li> + The <codeph>external_table</codeph> role lets us insert into and query the Impala table, + <codeph>external_table.sample</codeph>. + </li> + + <li> + The <codeph>staging_dir</codeph> role lets us specify the HDFS path + <filepath>/user/cloudera/external_data</filepath> with the <codeph>LOAD DATA</codeph> statement. + Remember, when Impala queries or loads data files, it operates on all the files in that directory, + not just a single file, so any Impala <codeph>LOCATION</codeph> parameters refer to a directory + rather than an individual file. + </li> + + <li> + We included the IP address and port of the Hadoop name node in the HDFS URI of the + <codeph>staging_dir</codeph> rule. We found those details in + <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the <codeph>fs.default.name</codeph> + element. That is what we use in any roles that specify URIs (that is, the locations of directories in + HDFS). + </li> + + <li> + We start this example after the table <codeph>external_table.sample</codeph> is already created. In + the policy file for the example, we have already taken away the <codeph>external_table_admin</codeph> + role from the <codeph>cloudera</codeph> group, and replaced it with the lesser-privileged + <codeph>external_table</codeph> role. + </li> + + <li> + We assign privileges to a subdirectory underneath <filepath>/user/cloudera</filepath> in HDFS, + because such privileges also apply to any subdirectories underneath. If we had assigned privileges to + the parent directory <filepath>/user/cloudera</filepath>, it would be too likely to mess up other + files by specifying a wrong location by mistake. + </li> + + <li> + The <codeph>cloudera</codeph> under the <codeph>[groups]</codeph> section refers to the + <codeph>cloudera</codeph> group. (In the demo VM used for this example, there is a + <codeph>cloudera</codeph> user that is a member of a <codeph>cloudera</codeph> group.) + </li> + </ul> + + <p> + Policy file: + </p> + +<codeblock>[groups] +cloudera = external_table, staging_dir + +[roles] +external_table_admin = server=server1->db=external_table +external_table = server=server1->db=external_table->table=sample->action=* +staging_dir = server=server1->uri=hdfs://127.0.0.1:8020/user/cloudera/external_data->action=* +</codeblock> + + <p> + <cmdname>impala-shell</cmdname> session: + </p> + +<codeblock>[localhost:21000] > use external_table; +Query: use external_table +[localhost:21000] > show tables; +Query: show tables +Query finished, fetching results ... ++--------+ +| name | ++--------+ +| sample | ++--------+ +Returned 1 row(s) in 0.02s + +[localhost:21000] > select * from sample; +Query: select * from sample +Query finished, fetching results ... ++-----+ +| x | ++-----+ +| 1 | +| 5 | +| 150 | ++-----+ +Returned 3 row(s) in 1.04s + +[localhost:21000] > load data inpath '/user/cloudera/external_data' into table sample; +Query: load data inpath '/user/cloudera/external_data' into table sample +Query finished, fetching results ... ++----------------------------------------------------------+ +| summary | ++----------------------------------------------------------+ +| Loaded 1 file(s). Total files in destination location: 2 | ++----------------------------------------------------------+ +Returned 1 row(s) in 0.26s +[localhost:21000] > select * from sample; +Query: select * from sample +Query finished, fetching results ... ++-------+ +| x | ++-------+ +| 2 | +| 4 | +| 6 | +| 8 | +| 64738 | +| 49152 | +| 1 | +| 5 | +| 150 | ++-------+ +Returned 9 row(s) in 0.22s + +[localhost:21000] > load data inpath '/user/cloudera/unauthorized_data' into table sample; +Query: load data inpath '/user/cloudera/unauthorized_data' into table sample +ERROR: AuthorizationException: User 'cloudera' does not have privileges to access: hdfs://127.0.0.1:8020/user/cloudera/unauthorized_data +</codeblock> + + </example> + + <example audience="Cloudera" id="sec_ex_views" rev="2.3.0 collevelauth"> + + <title>Controlling Access at the Column Level through Views</title> + + <p> + If a user has <codeph>SELECT</codeph> privilege for a view, they can query the view, even if they do + not have any privileges on the underlying table. To see the details about the underlying table through + <codeph>EXPLAIN</codeph> or <codeph>DESCRIBE FORMATTED</codeph> statements on the view, the user must + also have <codeph>SELECT</codeph> privilege for the underlying table. + </p> + + <note type="important"> + <p> + The types of data that are considered sensitive and confidential differ depending on the jurisdiction + the type of industry, or both. For fine-grained access controls, set up appropriate privileges based + on all applicable laws and regulations. + </p> + <p> + Be careful using the <codeph>ALTER VIEW</codeph> statement to point an existing view at a different + base table or a new set of columns that includes sensitive or restricted data. Make sure that any + users who have <codeph>SELECT</codeph> privilege on the view do not gain access to any additional + information they are not authorized to see. + </p> + </note> + + <p> + The following example shows how a system administrator could set up a table containing some columns + with sensitive information, then create a view that only exposes the non-confidential columns. + </p> + +<codeblock>[localhost:21000] > create table sensitive_info + > ( + > name string, + > address string, + > credit_card string, + > taxpayer_id string + > ); +[localhost:21000] > create view name_address_view as select name, address from sensitive_info; +</codeblock> + + <p> + Then the following policy file specifies read-only privilege for that view, without authorizing access + to the underlying table: + </p> + +<codeblock>[groups] +cloudera = view_only_privs + +[roles] +view_only_privs = server=server1->db=reports->table=name_address_view->action=SELECT +</codeblock> + + <p> + Thus, a user with the <codeph>view_only_privs</codeph> role could access through Impala queries the + basic information but not the sensitive information, even if both kinds of information were part of the + same data file. + </p> + + <p> + You might define other views to allow users from different groups to query different sets of columns. + </p> + + </example> + + <example id="sec_sysadmin"> + + <title>Separating Administrator Responsibility from Read and Write Privileges</title> + + <p> + Remember that to create a database requires full privilege on that database, while day-to-day + operations on tables within that database can be performed with lower levels of privilege on specific + table. Thus, you might set up separate roles for each database or application: an administrative one + that could create or drop the database, and a user-level one that can access only the relevant tables. + </p> + + <p> + For example, this policy file divides responsibilities between users in 3 different groups: + </p> + + <ul> + <li> + Members of the <codeph>supergroup</codeph> group have the <codeph>training_sysadmin</codeph> role and + so can set up a database named <codeph>training</codeph>. + </li> + + <li> + Members of the <codeph>cloudera</codeph> group have the <codeph>instructor</codeph> role and so can + create, insert into, and query any tables in the <codeph>training</codeph> database, but cannot + create or drop the database itself. + </li> + + <li> + Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph> role and so can query + those tables in the <codeph>training</codeph> database. + </li> + </ul> + +<codeblock>[groups] +supergroup = training_sysadmin +cloudera = instructor +visitor = student + +[roles] +training_sysadmin = server=server1->db=training +instructor = server=server1->db=training->table=*->action=* +student = server=server1->db=training->table=*->action=SELECT +</codeblock> + + </example> + </conbody> + </concept> + + <concept id="security_multiple_policy_files"> + + <title>Using Multiple Policy Files for Different Databases</title> + + <conbody> + + <p> + For an Impala cluster with many databases being accessed by many users and applications, it might be + cumbersome to update the security policy file for each privilege change or each new database, table, or + view. You can allow security to be managed separately for individual databases, by setting up a separate + policy file for each database: + </p> + + <ul> + <li> + Add the optional <codeph>[databases]</codeph> section to the main policy file. + </li> + + <li> + Add entries in the <codeph>[databases]</codeph> section for each database that has its own policy file. + </li> + + <li> + For each listed database, specify the HDFS path of the appropriate policy file. + </li> + </ul> + + <p> + For example: + </p> + +<codeblock>[databases] +# Defines the location of the per-DB policy files for the 'customers' and 'sales' databases. +customers = hdfs://ha-nn-uri/etc/access/customers.ini +sales = hdfs://ha-nn-uri/etc/access/sales.ini +</codeblock> + + <p> + To enable URIs in per-DB policy files, add the following string in the Cloudera Manager field + <uicontrol>Impala Service Environment Advanced Configuration Snippet (Safety Valve)</uicontrol>: + </p> + +<codeblock>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true" +</codeblock> + + <note type="important"> + Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level + policy file to grant himself/herself load privileges to anything the <codeph>impala</codeph> user has + read permissions for in HDFS (including data in other databases controlled by different db-level policy + files). + </note> + </conbody> + </concept> + </concept> + + <concept id="security_schema"> + + <title>Setting Up Schema Objects for a Secure Impala Deployment</title> + + <conbody> + + <p> + Remember that in your role definitions, you specify privileges at the level of individual databases and + tables, or all databases or all tables within a database. To simplify the structure of these rules, plan + ahead of time how to name your schema objects so that data with different authorization requirements is + divided into separate databases. + </p> + + <p> + If you are adding security on top of an existing Impala deployment, remember that you can rename tables or + even move them between databases using the <codeph>ALTER TABLE</codeph> statement. In Impala, creating new + databases is a relatively inexpensive operation, basically just creating a new directory in HDFS. + </p> + + <p> + You can also plan the security scheme and set up the policy file before the actual schema objects named in + the policy file exist. Because the authorization capability is based on whitelisting, a user can only + create a new database or table if the required privilege is already in the policy file: either by listing + the exact name of the object being created, or a <codeph>*</codeph> wildcard to match all the applicable + objects within the appropriate container. + </p> + </conbody> + </concept> + + <concept id="security_privileges"> + + <title>Privilege Model and Object Hierarchy</title> + + <conbody> + + <p> + Privileges can be granted on different objects in the schema. Any privilege that can be granted is + associated with a level in the object hierarchy. If a privilege is granted on a container object in the + hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other + database systems such as MySQL. + </p> + + <p> + The kinds of objects in the schema hierarchy are: + </p> + +<codeblock>Server +URI +Database + Table +</codeblock> + + <p> + The server name is specified by the <codeph>-server_name</codeph> option when <cmdname>impalad</cmdname> + starts. Specify the same name for all <cmdname>impalad</cmdname> nodes in the cluster. + </p> + + <p> + URIs represent the HDFS paths you specify as part of statements such as <codeph>CREATE EXTERNAL + TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what look like UNIX paths, but these + locations can also be prefixed with <codeph>hdfs://</codeph> to make clear that they are really URIs. To + set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in + that directory and any directories underneath it. + </p> + + <p rev="2.3.0 collevelauth"> + In <keyword keyref="impala23_full"/> and higher, you can specify privileges for individual columns, + as described in + <xref audience="integrated" href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref audience="standalone" href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html" format="html" scope="external"/>. + Formerly, to specify + read privileges at this level, you created a view that queried specific columns and/or partitions from a base + table, and gave <codeph>SELECT</codeph> privilege on the view but not the underlying table. + </p> + + <p> + URIs must start with either <codeph>hdfs://</codeph> or <codeph>file://</codeph>. If a URI starts with + anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS, + you must also specify the NameNode. For example: +<codeblock>data_read = server=server1->uri=file:///path/to/dir, \ +server=server1->uri=hdfs://namenode:port/path/to/dir +</codeblock> + <note type="warning"> + <p> + Because the NameNode host and port must be specified, Cloudera strongly recommends you use High + Availability (HA). This ensures that the URI will remain constant even if the namenode changes. + </p> +<codeblock>data_read = server=server1->uri=file:///path/to/dir,\ server=server1->uri=hdfs://ha-nn-uri/path/to/dir +</codeblock> + </note> + </p> + +<!-- Experiment with replacing my original copied table with a conref'ed version of Ambreen's from the Security Guide. --> + +<!-- + <table> + <title>Sentry privilege types and objects they apply to</title> + <tgroup cols="2"> + <colspec colnum="1" colname="col1"/> + <colspec colnum="2" colname="col2"/> + <tbody> + <row> + <entry>Privilege</entry> + <entry>Object</entry> + </row> + <row> + <entry>INSERT</entry> + <entry>TABLE, URI</entry> + </row> + <row> + <entry>SELECT</entry> + <entry>TABLE, VIEW, URI</entry> + </row> + <row> + <entry>ALL</entry> + <entry>SERVER, DB, URI</entry> + </row> + </tbody> + </tgroup> + </table> +--> + + <table conref="../shared/impala_common.xml#common/sentry_privileges_objects"> + <tgroup cols="2"> + <colspec colnum="1" colname="col1" colwidth="1*"/> + <tbody> + <row> + <entry/> + </row> + </tbody> + </tgroup> + </table> + + <note> + <p> + Although this document refers to the <codeph>ALL</codeph> privilege, currently if you use the policy file + mode, you do not use the actual keyword <codeph>ALL</codeph> in the policy file. When you code role + entries in the policy file: + </p> + <ul> + <li> + To specify the <codeph>ALL</codeph> privilege for a server, use a role like + <codeph>server=<varname>server_name</varname></codeph>. + </li> + + <li> + To specify the <codeph>ALL</codeph> privilege for a database, use a role like + <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname></codeph>. + </li> + + <li> + To specify the <codeph>ALL</codeph> privilege for a table, use a role like + <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=*</codeph>. + </li> + </ul> + </note> + + <draft-comment author="ambreen.kazi" translate="no">The Privilege Model lives at: +https://wiki.cloudera.com/pages/viewpage.action?pageId=24919851</draft-comment> + + <table> + <tgroup cols="4"> + <colspec colnum="1" colname="col1" colwidth="1.31*"/> + <colspec colnum="2" colname="col2" colwidth="1.17*"/> + <colspec colnum="3" colname="col3" colwidth="1*"/> + <colspec colname="newCol4" colnum="4" colwidth="1*"/> + <thead> + <row> + <entry> + Operation + </entry> + <entry> + Scope + </entry> + <entry> + Privileges + </entry> + <entry> + URI + </entry> + </row> + </thead> + <tbody> + <row conref="../shared/impala_common.xml#common/explain_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/load_data_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/create_database_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/drop_database_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/create_table_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/drop_table_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/describe_table_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_add_columns_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_replace_columns_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_change_column_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_rename_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_set_tblproperties_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_set_fileformat_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_add_partition_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_add_partition_location_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_drop_partition_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_partition_set_fileformat_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_set_serdeproperties_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/create_view_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/drop_view_privs"> + <entry/> + </row> + <row id="alter_view_privs"> + <entry> + ALTER VIEW + </entry> + <entry rev="2.3.0 collevelauth"> + You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and the parent + database</ph>, plus <codeph>SELECT</codeph> privilege for any tables or views referenced by the + view query. Once the view is created or altered by a high-privileged system administrator, it can + be queried by a lower-privileged user who does not have full query privileges for the base tables. + </entry> + <entry> + ALL, SELECT + </entry> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs"> + <entry/> + </row> + <row id="create_external_table_privs"> + <entry> + CREATE EXTERNAL TABLE + </entry> + <entry> + Database (ALL), URI (SELECT) + </entry> + <entry> + ALL, SELECT + </entry> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/select_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/use_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/create_function_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/drop_function_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/refresh_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/invalidate_metadata_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/invalidate_metadata_table_privs"> + <entry/> + </row> + <row conref="../shared/impala_common.xml#common/compute_stats_privs"> + <entry/> + </row> + <row id="show_table_stats_privs"> + <entry> + SHOW TABLE STATS, SHOW PARTITIONS + </entry> + <entry> + TABLE + </entry> + <entry> + SELECT/INSERT + </entry> + <entry/> + </row> + <row> + <entry id="show_column_stats_privs"> + SHOW COLUMN STATS + </entry> + <entry> + TABLE + </entry> + <entry> + SELECT/INSERT + </entry> + <entry/> + </row> + <row> + <entry id="show_functions_privs"> + SHOW FUNCTIONS + </entry> + <entry> + DATABASE + </entry> + <entry> + SELECT + </entry> + <entry/> + </row> + <row id="show_tables_privs"> + <entry> + SHOW TABLES + </entry> + <entry/> + <entry> + No special privileges needed to issue the statement, but only shows objects you are authorized for + </entry> + <entry/> + </row> + <row id="show_databases_privs"> + <entry> + SHOW DATABASES, SHOW SCHEMAS + </entry> + <entry/> + <entry> + No special privileges needed to issue the statement, but only shows objects you are authorized for + </entry> + <entry/> + </row> + </tbody> + </tgroup> + </table> + +<!-- Original table, now replaced by content conref'ed from Security Guide +<table> +<title>Sentry privileges for Impala SQL operations</title> + <tgroup cols="3"> + <colspec colnum="1" colname="col1" colwidth="*"/> + <colspec colnum="2" colname="col2" colwidth="*"/> + <colspec colnum="3" colname="col3" colwidth="4*"/> +<thead> +<row> +<entry>Impala SQL Operation</entry> +<entry>Privileges Required</entry> +<entry>Object on which Privileges Required</entry> +</row> +</thead> +<tbody> +<row> +<entry>EXPLAIN</entry> +<entry>SELECT</entry> +<entry>Table</entry> +</row> +<row> +<entry>LOAD DATA</entry> +<entry>INSERT, SELECT</entry> +<entry>Table (INSERT) and URI (ALL); write privilege is required on the URI because the data files are physically moved from there</entry> +</row> +<row> +<entry>CREATE DATABASE</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Server</ph></entry> +</row> +<row> +<entry>DROP DATABASE</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Server</ph></entry> +</row> +<row> +<entry>CREATE TABLE</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Database</ph></entry> +</row> +<row> +<entry>DROP TABLE</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Database</ph></entry> +</row> +<row> +<entry>DESCRIBE TABLE</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>ALTER TABLE</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Database</ph></entry> +</row> +<row> +<entry>SHOW DATABASES</entry> +<entry>Any privilege</entry> +<entry>Any object in the database; only databases for which the user has privileges on some object are shown</entry> +</row> +<row> +<entry>SHOW TABLES</entry> +<entry>Any privilege</entry> +<entry>Any table in the database; only tables for which the user has privileges are shown</entry> +</row> +<row> +<entry>CREATE VIEW</entry> +<entry>ALL, SELECT</entry> +<entry rev="2.3.0 collevelauth"> +You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables or views referenced by the view query. +Once the view is created or altered by a high-privileged system administrator, +it can be queried by a lower-privileged user who does not have full query privileges for the base tables. +</entry> +</row> +<row> +<entry>DROP VIEW</entry> +<entry>ALL</entry> +<entry><ph rev="1.4.0">Database</ph></entry> +</row> +<row> +<entry>ALTER VIEW</entry> +<entry>ALL, SELECT</entry> +<entry rev="2.3.0 collevelauth"> +You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables or views referenced by the view query. +Once the view is created or altered by a high-privileged system administrator, +it can be queried by a lower-privileged user who does not have full query privileges for the base tables. +</entry> +</row> +<row> +<entry>ALTER TABLE LOCATION</entry> +<entry>ALL</entry> +<entry>Table, URI</entry> +</row> +<row> +<entry>CREATE EXTERNAL TABLE</entry> +<entry>ALL, SELECT</entry> +<entry>Database (ALL), URI (SELECT)</entry> +</row> +<row> +<entry>SELECT</entry> +<entry>SELECT</entry> +<entry rev="2.3.0 collevelauth">Table, View; you can have SELECT privilege for a view without having SELECT privilege +for the underlying tables. See +<xref audience="integrated" href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref audience="standalone" href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html" format="html" scope="external"/> +for details about column-level authorization. +Before column-level authorization was added in <keyword keyref="impala23_full"/> the way to implement +column-level security was by creating views that referenced particular sets of columns.</entry> +</row> +<row> +<entry>USE</entry> +<entry>Any privilege</entry> +<entry>Any object in the database</entry> +</row> +<row rev="1.2"> +<entry>CREATE FUNCTION</entry> +<entry>ALL</entry> +<entry>Server</entry> +</row> +<row rev="1.2"> +<entry>DROP FUNCTION</entry> +<entry>ALL</entry> +<entry>Server</entry> +</row> +<row> +<entry>REFRESH</entry> +<entry>ALL</entry> +<entry>Table</entry> +</row> +<row> +<entry>INVALIDATE METADATA</entry> +<entry>ALL</entry> +<entry>Server</entry> +</row> +<row rev="1.2.2"> +<entry>COMPUTE STATS</entry> +<entry>ALL</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW TABLE STATS, SHOW PARTITIONS</entry> +<entry>?</entry> +<entry>?</entry> +</row> +<row> +<entry>SHOW COLUMN STATS</entry> +<entry>?</entry> +<entry>?</entry> +</row> +<row> +<entry>SHOW FUNCTIONS</entry> +<entry>?</entry> +<entry>?</entry> +</row> +<row> +<entry>SHOW DATABASES, SHOW SCHEMAS</entry> +<entry>?</entry> +<entry>?</entry> +</row> +</tbody> +</tgroup> +</table> +--> + +<!-- Here are pieces from the original Hive table that do not apply to Impala because of differences in the SQL syntax. + Might resurrect some of these later as features are added to Impala, in particular CREATE TABLE AS SELECT. +<row> +<entry>EXPORT</entry> +<entry>SELECT, INSERT@URI</entry> +<entry>Table</entry> +</row> +<row> +<entry>IMPORT</entry> +<entry>ALL, SELECT@URI</entry> +<entry>Database</entry> +</row> +<row> +<entry>ANALYZE</entry> +<entry>INSERT, SELECT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SWITCH DATABASE</entry> +<entry>Any</entry> +<entry>Any Table, View in the DB</entry> +</row> +<row> +<entry>DESCRIBE FUNCTION*</entry> +<entry>Not Restricted</entry> +</row> +<row> +<entry>MSCK</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>SHOW COLUMNS</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW TABLE STATUS</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW TABLE PROPERTIES</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW CREATE TABLE</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW FUNCTIONS*</entry> +<entry>Not restricted</entry> +</row> +<row> +<entry>SHOW PARTITIONS</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW INDEXES</entry> +<entry>SELECT or INSERT</entry> +<entry>Table</entry> +</row> +<row> +<entry>SHOW LOCKS</entry> +<entry>Not Restricted</entry> +</row> +<row> +<entry>CREATE FUNCTION</entry> +<entry>Any Privilege</entry> +<entry>Any Object</entry> +</row> +<row> +<entry>CREATE INDEX</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>DROP FUNCTION</entry> +<entry>Any Privilege</entry> +<entry>Any Object</entry> +</row> +<row> +<entry>DROP INDEX</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER INDEX REBUILD</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER VIEW PROPERTIES</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>GRANT PRIVILEGE</entry> +<entry>Allowed, but has no effect on HS2 auth</entry> +</row> +<row> +<entry>REVOKE PRIVILEGE</entry> +<entry>Allowed, but has no effect on HS2 auth</entry> +</row> +<row> +<entry>SHOW GRANTS</entry> +<entry>Allowed, but has no effect on HS2 auth</entry> +</row> +<row> +<entry>ALTER TABLE PROTECT MODE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER TABLE FILE FORMAT</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER PARTITION PROTECT MODE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER PARTITION FILE FORMAT</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER PARTITION LOCATION*</entry> +<entry>ALL</entry> +<entry>Server</entry> +</row> +<row> +<entry>CREATE TABLE AS SELECT</entry> +<entry>ALL, SELECT</entry> +<entry>Database, Table/View</entry> +</row> +<row> +<entry>ALTER INDEX PROP</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER DATABASE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>DESCRIBE DATABASE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER TABLE MERGE FILE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER PARTITION MERGE FILE</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER TABLE SKEWED</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ALTER TABLE PARTITION SKEWED LOCATION</entry> +<entry>ALL</entry> +<entry>Database</entry> +</row> +<row> +<entry>ADD JAR*</entry> +<entry>Restricted unless hive.server2.authorization.external.exec = true</entry> +</row> +<row> +<entry>TRANSFORM*</entry> +<entry>ALL</entry> +<entry>Server</entry> +</row> +--> + </conbody> + </concept> + + <concept id="sentry_debug"> + + <title><ph conref="../shared/impala_common.xml#common/title_sentry_debug"/></title> + + <conbody> + + <p conref="../shared/impala_common.xml#common/sentry_debug"/> + </conbody> + </concept> + + <concept id="sentry_cm"> + + <title>Managing Sentry for Impala through Cloudera Manager</title> + + <conbody> + + <p> + To enable the Sentry service for Impala and Hive, set <uicontrol>Hive/Impala > Service-Wide > Sentry + Service</uicontrol> parameter to the Sentry service. Then restart Impala and Hive. Simply adding Sentry + service as a dependency and restarting enables Impala and Hive to use the Sentry service. + </p> + + <p> + To set the server name to use when granting server level privileges, set the <uicontrol>Hive > + Service-Wide > Advanced > Server Name for Sentry Authorization</uicontrol> parameter. When using + Sentry with the Hive Metastore, you can specify the list of users that are allowed to bypass Sentry + Authorization in Hive Metastore using <uicontrol>Hive > Service-Wide > Security > Bypass Sentry + Authorization Users</uicontrol>. These are usually service users that already ensure all activity has been + authorized. + </p> + + <note> + The <uicontrol>Hive/Impala > Service-Wide > Policy File Based Sentry</uicontrol> tab contains + parameters only relevant to configuring Sentry using policy files. In particular, make sure that + <uicontrol>Enable Sentry Authorization using Policy Files</uicontrol> parameter is unchecked when using the + Sentry service. Cloudera Manager throws a validation error if you attempt to configure the Sentry service + and policy file at the same time. + </note> + </conbody> + </concept> + + <concept id="sec_ex_default"> + + <title>The DEFAULT Database in a Secure Deployment</title> + + <conbody> + + <p> + Because of the extra emphasis on granular access controls in a secure deployment, you should move any + important or sensitive information out of the <codeph>DEFAULT</codeph> database into a named database whose + privileges are specified in the policy file. Sometimes you might need to give privileges on the + <codeph>DEFAULT</codeph> database for administrative reasons; for example, as a place you can reliably + specify with a <codeph>USE</codeph> statement when preparing to drop a database. + </p> + +<!-- Maybe have an example later, but not for initial 1.1 release. +<codeblock></codeblock> +--> + </conbody> + </concept> +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_avg.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_avg.xml b/docs/topics/impala_avg.xml new file mode 100644 index 0000000..d95b2d5 --- /dev/null +++ b/docs/topics/impala_avg.xml @@ -0,0 +1,225 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="avg"> + + <title>AVG Function</title> + <titlealts audience="PDF"><navtitle>AVG</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="Analytic Functions"/> + <data name="Category" value="Aggregate Functions"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">avg() function</indexterm> + An aggregate function that returns the average value from a set of numbers or <codeph>TIMESTAMP</codeph> values. + Its single argument can be numeric column, or the numeric result of a function or expression applied to the + column value. Rows with a <codeph>NULL</codeph> value for the specified column are ignored. If the table is empty, + or all the values supplied to <codeph>AVG</codeph> are <codeph>NULL</codeph>, <codeph>AVG</codeph> returns + <codeph>NULL</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock>AVG([DISTINCT | ALL] <varname>expression</varname>) [OVER (<varname>analytic_clause</varname>)] +</codeblock> + + <p> + When the query contains a <codeph>GROUP BY</codeph> clause, returns one value for each combination of + grouping values. + </p> + + <p> + <b>Return type:</b> <codeph>DOUBLE</codeph> for numeric values; <codeph>TIMESTAMP</codeph> for + <codeph>TIMESTAMP</codeph> values + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- Average all the non-NULL values in a column. +insert overwrite avg_t values (2),(4),(6),(null),(null); +-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are ignored. +select avg(x) from avg_t; +-- Average only certain values from the column. +select avg(x) from t1 where month = 'January' and year = '2013'; +-- Apply a calculation to the value of the column before averaging. +select avg(x/3) from t1; +-- Apply a function to the value of the column before averaging. +-- Here we are substituting a value of 0 for all NULLs in the column, +-- so that those rows do factor into the return value. +select avg(isnull(x,0)) from t1; +-- Apply some number-returning function to a string column and average the results. +-- If column s contains any NULLs, length(s) also returns NULL and those rows are ignored. +select avg(length(s)) from t1; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Return more than one result. +select month, year, avg(page_visits) from web_stats group by month, year; +-- Filter the input to eliminate duplicates before performing the calculation. +select avg(distinct x) from t1; +-- Filter the output after performing the calculation. +select avg(x) from t1 group by y having avg(x) between 1 and 20; +</codeblock> + + <p rev="2.0.0"> + The following examples show how to use <codeph>AVG()</codeph> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <codeph>AVG()</codeph> is reported for each input value, as + opposed to the <codeph>GROUP BY</codeph> clause which condenses the result set. +<codeblock>select x, property, avg(x) over (partition by property) as avg from int_t where property in ('odd','even'); ++----+----------+-----+ +| x | property | avg | ++----+----------+-----+ +| 2 | even | 6 | +| 4 | even | 6 | +| 6 | even | 6 | +| 8 | even | 6 | +| 10 | even | 6 | +| 1 | odd | 5 | +| 3 | odd | 5 | +| 5 | odd | 5 | +| 7 | odd | 5 | +| 9 | odd | 5 | ++----+----------+-----+ +</codeblock> + +Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <q>window</q>). The following examples use <codeph>AVG()</codeph> in an analytic context +(that is, with an <codeph>OVER()</codeph> clause) to produce a running average of all the even values, +then a running average of all the odd values. The basic <codeph>ORDER BY x</codeph> clause implicitly +activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +therefore all of these examples produce the same results: +<codeblock>select x, property, + avg(x) over (partition by property <b>order by x</b>) as 'cumulative average' + from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ + +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>range between unbounded preceding and current row</b> + ) as 'cumulative average' +from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ + +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>rows between unbounded preceding and current row</b> + ) as 'cumulative average' + from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ +</codeblock> + +The following examples show how to construct a moving window, with a running average taking into account 1 row before +and 1 row after the current row, within the same partition (all the even values or all the odd values). +Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this type of +moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but not the <codeph>RANGE BETWEEN</codeph> +clause: +<codeblock>select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>rows between 1 preceding and 1 following</b> + ) as 'moving average' + from int_t where property in ('odd','even'); ++----+----------+----------------+ +| x | property | moving average | ++----+----------+----------------+ +| 2 | even | 3 | +| 4 | even | 4 | +| 6 | even | 6 | +| 8 | even | 8 | +| 10 | even | 9 | +| 1 | odd | 2 | +| 3 | odd | 3 | +| 5 | odd | 5 | +| 7 | odd | 7 | +| 9 | odd | 8 | ++----+----------+----------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>range between 1 preceding and 1 following</b> + ) as 'moving average' +from int_t where property in ('odd','even'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</codeblock> + </p> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + +<!-- This conref appears under SUM(), AVG(), FLOAT, and DOUBLE topics. --> + + <p conref="../shared/impala_common.xml#common/sum_double"/> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_analytic_functions.xml#analytic_functions"/>, <xref href="impala_max.xml#max"/>, + <xref href="impala_min.xml#min"/> + </p> + </conbody> +</concept>
