[43/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:39 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_authorization.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_authorization.xml 
b/docs/topics/impala_authorization.xml
new file mode 100644
index 0000000..5163b2f
--- /dev/null
+++ b/docs/topics/impala_authorization.xml
@@ -0,0 +1,1621 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="authorization">
+
+  <title>Enabling Sentry Authorization for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Starting and Stopping"/>
+      <data name="Category" value="Users"/>
+      <data name="Category" value="Groups"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody id="sentry">
+
+    <p>
+      Authorization determines which users are allowed to access which 
resources, and what operations they are
+      allowed to perform. In Impala 1.1 and higher, you use the Sentry open 
source project for authorization.
+      Sentry adds a fine-grained authorization framework for Hadoop. By 
default (when authorization is not
+      enabled), Impala does all read and write operations with the privileges 
of the <codeph>impala</codeph> user,
+      which is suitable for a development/test environment but not for a 
secure production environment. When
+      authorization is enabled, Impala uses the OS user ID of the user who 
runs <cmdname>impala-shell</cmdname> or
+      other client program, and associates various privileges with each user.
+    </p>
+
+    <note>
+      Sentry is typically used in conjunction with Kerberos authentication, 
which defines which hosts are allowed
+      to connect to each server. Using the combination of Sentry and Kerberos 
prevents malicious users from being
+      able to connect by creating a named account on an untrusted machine. See
+      <xref href="impala_kerberos.xml#kerberos"/> for details about Kerberos 
authentication.
+    </note>
+
+    <p audience="PDF" outputclass="toc inpage">
+      See the following sections for details about using the Impala 
authorization features:
+    </p>
+  </conbody>
+
+  <concept id="sentry_priv_model">
+
+    <title>The Sentry Privilege Model</title>
+
+    <conbody>
+
+      <p>
+        Privileges can be granted on different objects in the schema. Any 
privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is 
granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the 
same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p rev="2.3.0 collevelauth">
+        The object hierarchy for Impala covers Server, URI, Database, Table, 
and Column. (The Table privileges apply to views as well;
+        anywhere you specify a table name, you can specify a view name 
instead.)
+        Column-level authorization is available in <keyword 
keyref="impala23_full"/> and higher, as described in
+        <xref audience="integrated" 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref 
audience="standalone" 
href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html";
 format="html" scope="external"/>.
+        Previously, you constructed views to query specific columns and 
assigned privileges
+        based on the views rather than the base tables.
+      </p>
+
+      <p>
+        A restricted set of privileges determines what you can do with each 
object:
+      </p>
+
+      <dl>
+        <dlentry id="select_priv">
+
+          <dt>
+            SELECT privilege
+          </dt>
+
+          <dd>
+            Lets you read data from a table or view, for example with the 
<codeph>SELECT</codeph> statement, the
+            <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE 
TABLE...LIKE</codeph>. Also required to
+            issue the <codeph>DESCRIBE</codeph> statement or the 
<codeph>EXPLAIN</codeph> statement for a query
+            against a particular table. Only objects for which a user has this 
privilege are shown in the output
+            for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW 
TABLES</codeph> statements. The
+            <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE 
METADATA</codeph> statements only access
+            metadata for tables for which the user has this privilege.
+          </dd>
+
+        </dlentry>
+
+        <dlentry id="insert_priv">
+
+          <dt>
+            INSERT privilege
+          </dt>
+
+          <dd>
+            Lets you write data to a table. Applies to the 
<codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph>
+            statements.
+          </dd>
+
+        </dlentry>
+
+        <dlentry id="all_priv">
+
+          <dt>
+            ALL privilege
+          </dt>
+
+          <dd>
+            Lets you create or modify the object. Required to run DDL 
statements such as <codeph>CREATE
+            TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP 
TABLE</codeph> for a table,
+            <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> 
for a database, or <codeph>CREATE
+            VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP 
VIEW</codeph> for a view. Also required for
+            the URI of the <q>location</q> parameter for the <codeph>CREATE 
EXTERNAL TABLE</codeph> and
+            <codeph>LOAD DATA</codeph> statements.
+<!-- Have to think about the best wording, how often to repeat, how best to 
conref this caveat.
+          You do not actually code the keyword <codeph>ALL</codeph> in the 
policy file; instead you use
+          <codeph>action=*</codeph> or shorten the right-hand portion of the 
rule.
+          -->
+          </dd>
+
+        </dlentry>
+      </dl>
+
+      <p>
+        Privileges can be specified for a table or view before that object 
actually exists. If you do not have
+        sufficient privilege to perform an operation, the error message does 
not disclose if the object exists or
+        not.
+      </p>
+
+      <p>
+        Originally, privileges were encoded in a policy file, stored in HDFS. 
This mode of operation is still an
+        option, but the emphasis of privilege management is moving towards 
being SQL-based. Although currently
+        Impala does not have <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> 
statements, Impala can make use of
+        privileges assigned through <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements done through
+        Hive. The mode of operation with <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements instead of
+        the policy file requires that a special Sentry service be enabled; 
this service stores, retrieves, and
+        manipulates privilege information stored inside the metastore database.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="secure_startup">
+
+    <title>Starting the impalad Daemon with Sentry Authorization 
Enabled</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Starting and Stopping"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        To run the <cmdname>impalad</cmdname> daemon with authorization 
enabled, you add one or more options to the
+        <codeph>IMPALA_SERVER_ARGS</codeph> declaration in the 
<filepath>/etc/default/impala</filepath>
+        configuration file:
+      </p>
+
+      <ul>
+        <li>
+          The <codeph>-server_name</codeph> option turns on Sentry 
authorization for Impala. The authorization
+          rules refer to a symbolic server name, and you specify the name to 
use as the argument to the
+          <codeph>-server_name</codeph> option.
+        </li>
+
+        <li rev="1.4.0">
+          If you specify just <codeph>-server_name</codeph>, Impala uses the 
Sentry service for authorization,
+          relying on the results of <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements issued through
+          Hive. (This mode of operation is available in Impala 1.4.0 and 
higher.) Prior to Impala 1.4.0, or if you
+          want to continue storing privilege rules in the policy file, also 
specify the
+          <codeph>-authorization_policy_file</codeph> option as in the 
following item.
+        </li>
+
+        <li>
+          Specifying the <codeph>-authorization_policy_file</codeph> option in 
addition to
+          <codeph>-server_name</codeph> makes Impala read privilege 
information from a policy file, rather than
+          from the metastore database. The argument to the 
<codeph>-authorization_policy_file</codeph> option
+          specifies the HDFS path to the policy file that defines the 
privileges on different schema objects.
+        </li>
+      </ul>
+
+      <p rev="1.4.0">
+        For example, you might adapt your 
<filepath>/etc/default/impala</filepath> configuration to contain lines
+        like the following. To use the Sentry service rather than the policy 
file:
+      </p>
+
+<codeblock rev="1.4.0">IMPALA_SERVER_ARGS=" \
+-server_name=server1 \
+...
+</codeblock>
+
+      <p>
+        Or to use the policy file, as in releases prior to Impala 1.4:
+      </p>
+
+<codeblock>IMPALA_SERVER_ARGS=" \
+-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
+-server_name=server1 \
+...
+</codeblock>
+
+      <p>
+        The preceding examples set up a symbolic name of 
<codeph>server1</codeph> to refer to the current instance
+        of Impala. This symbolic name is used in the following ways:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            In an environment managed by Cloudera Manager, the server name is 
specified through
+            <menucascade><uicontrol>Impala 
(Service-Wide)</uicontrol><uicontrol>Category</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry
 Service</uicontrol></menucascade> and
+            
<menucascade><uicontrol>Hive</uicontrol><uicontrol>Service-Wide</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry
 Service</uicontrol></menucascade>.
+            The values must be the same for both, so that Impala and Hive can 
share the privilege rules. Restart
+            the Impala and Hive services after setting or changing this value.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            In an environment not managed by Cloudera Manager, you specify 
this value for the
+            <codeph>sentry.hive.server</codeph> property in the 
<filepath>sentry-site.xml</filepath> configuration
+            file for Hive, as well as in the <codeph>-server_name</codeph> 
option for <cmdname>impalad</cmdname>.
+          </p>
+          <p>
+            If the <cmdname>impalad</cmdname> daemon is not already running, 
start it as described in
+            <xref href="impala_processes.xml#processes"/>. If it is already 
running, restart it with the command
+            <codeph>sudo /etc/init.d/impala-server restart</codeph>. Run the 
appropriate commands on all the nodes
+            where <cmdname>impalad</cmdname> normally runs.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you use the mode of operation using the policy file, the rules 
in the <codeph>[roles]</codeph>
+            section of the policy file refer to this same 
<codeph>server1</codeph> name. For example, the following
+            rule sets up a role <codeph>report_generator</codeph> that lets 
users with that role query any table in
+            a database named <codeph>reporting_db</codeph> on a node where the 
<cmdname>impalad</cmdname> daemon
+            was started up with the <codeph>-server_name=server1</codeph> 
option:
+          </p>
+<codeblock>[roles]
+report_generator = 
server=server1-&gt;db=reporting_db-&gt;table=*-&gt;action=SELECT
+</codeblock>
+        </li>
+      </ul>
+
+      <p>
+        When <cmdname>impalad</cmdname> is started with one or both of the 
<codeph>-server_name=server1</codeph>
+        and <codeph>-authorization_policy_file</codeph> options, Impala 
authorization is enabled. If Impala detects
+        any errors or inconsistencies in the authorization settings or the 
policy file, the daemon refuses to
+        start.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="sentry_service">
+
+    <title>Using Impala with the Sentry Service (<keyword keyref="impala14"/> 
or higher only)</title>
+
+    <conbody>
+
+      <p>
+        When you use the Sentry service rather than the policy file, you set 
up privileges through
+        <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statement in either 
Impala or Hive, then both components
+        use those same privileges automatically. (Impala added the 
<codeph>GRANT</codeph> and
+        <codeph>REVOKE</codeph> statements in <keyword 
keyref="impala20_full"/>.)
+      </p>
+
+      <p>
+        Hive already had <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements prior to CDH 5.1, but those
+        statements were not production-ready. CDH 5.1 is the first release 
where those statements use the Sentry
+        framework and are considered GA level. If you used the Hive 
<codeph>GRANT</codeph> and
+        <codeph>REVOKE</codeph> statements prior to CDH 5.1, you must set up 
these privileges with the CDH 5.1
+        versions of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> to take 
advantage of Sentry authorization.
+      </p>
+
+      <p>
+        For information about using the updated Hive <codeph>GRANT</codeph> 
and <codeph>REVOKE</codeph> statements,
+        see
+<!-- Original URL: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Security-Guide/cdh_sg_sentry_service.html
 -->
+        <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_sentry_service.html";
 scope="external" format="html">Sentry
+        service</xref> topic in the <cite>CDH 5 Security Guide</cite>.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="security_policy_file">
+
+    <title>Using Impala with the Sentry Policy File</title>
+
+    <conbody>
+
+      <p>
+        The policy file is a file that you put in a designated location in 
HDFS, and is read during the startup of
+        the <cmdname>impalad</cmdname> daemon when you specify both the 
<codeph>-server_name</codeph> and
+        <codeph>-authorization_policy_file</codeph> startup options. It 
controls which objects (databases, tables,
+        and HDFS directory paths) can be accessed by the user who connects to 
<cmdname>impalad</cmdname>, and what
+        operations that user can perform on the objects.
+      </p>
+
+      <note rev="1.4.0">
+        <p rev="1.4.0">
+          In <ph rev="upstream">CDH 5</ph> and higher, <ph 
rev="upstream">Cloudera</ph> recommends
+          managing privileges through SQL statements, as described in
+          <xref href="impala_authorization.xml#sentry_service"/>. If you are 
still using policy files, plan to
+          migrate to the new approach some time in the future.
+        </p>
+      </note>
+
+      <p>
+        The location of the policy file is listed in the 
<filepath>auth-site.xml</filepath> configuration file. To
+        minimize overhead, the security information from this file is cached 
by each <cmdname>impalad</cmdname>
+        daemon and refreshed automatically, with a default interval of 5 
minutes. After making a substantial change
+        to security policies, restart all Impala daemons to pick up the 
changes immediately.
+      </p>
+
+      <p outputclass="toc inpage"/>
+    </conbody>
+
+    <concept id="security_policy_file_details">
+
+      <title>Policy File Location and Format</title>
+
+      <conbody>
+
+        <p>
+          The policy file uses the familiar <codeph>.ini</codeph> format, 
divided into the major sections
+          <codeph>[groups]</codeph> and <codeph>[roles]</codeph>. There is 
also an optional
+          <codeph>[databases]</codeph> section, which allows you to specify a 
specific policy file for a particular
+          database, as explained in <xref 
href="#security_multiple_policy_files"/>. Another optional section,
+          <codeph>[users]</codeph>, allows you to override the OS-level 
mapping of users to groups; that is an
+          advanced technique primarily for testing and debugging, and is 
beyond the scope of this document.
+        </p>
+
+        <p>
+          In the <codeph>[groups]</codeph> section, you define various 
categories of users and select which roles
+          are associated with each category. The group and usernames 
correspond to Linux groups and users on the
+          server where the <cmdname>impalad</cmdname> daemon runs.
+        </p>
+
+        <p>
+          The group and usernames in the <codeph>[groups]</codeph> section 
correspond to Linux groups and users on
+          the server where the <cmdname>impalad</cmdname> daemon runs. When 
you access Impala through the
+          <cmdname>impalad</cmdname> interpreter, for purposes of 
authorization, the user is the logged-in Linux
+          user and the groups are the Linux groups that user is a member of. 
When you access Impala through the
+          ODBC or JDBC interfaces, the user and password specified through the 
connection string are used as login
+          credentials for the Linux server, and authorization is based on that 
username and the associated Linux
+          group membership.
+        </p>
+
+        <p>
+          In the <codeph>[roles]</codeph> section, you a set of roles. For 
each role, you specify precisely the set
+          of privileges is available. That is, which objects users with that 
role can access, and what operations
+          they can perform on those objects. This is the lowest-level category 
of security information; the other
+          sections in the policy file map the privileges to higher-level 
divisions of groups and users. In the
+          <codeph>[groups]</codeph> section, you specify which roles are 
associated with which groups. The group
+          and usernames correspond to Linux groups and users on the server 
where the <cmdname>impalad</cmdname>
+          daemon runs. The privileges are specified using patterns like:
+<codeblock>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=SELECT
+server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=CREATE
+server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=ALL
+</codeblock>
+          For the <varname>server_name</varname> value, substitute the same 
symbolic name you specify with the
+          <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You 
can use <codeph>*</codeph> wildcard
+          characters at each level of the privilege specification to allow 
access to all such objects. For example:
+<codeblock>server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=*-&gt;table=*-&gt;action=CREATE
+server=impala-host.example.com-&gt;db=*-&gt;table=audit_log-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=*
+</codeblock>
+        </p>
+
+        <p>
+          When authorization is enabled, Impala uses the policy file as a 
<i>whitelist</i>, representing every
+          privilege available to any user on any object. That is, only 
operations specified for the appropriate
+          combination of object, role, group, and user are allowed; all other 
operations are not allowed. If a
+          group or role is defined multiple times in the policy file, the last 
definition takes precedence.
+        </p>
+
+        <p>
+          To understand the notion of whitelisting, set up a minimal policy 
file that does not provide any
+          privileges for any object. When you connect to an Impala node where 
this policy file is in effect, you
+          get no results for <codeph>SHOW DATABASES</codeph>, and an error 
when you issue any <codeph>SHOW
+          TABLES</codeph>, <codeph>USE 
<varname>database_name</varname></codeph>, <codeph>DESCRIBE
+          <varname>table_name</varname></codeph>, <codeph>SELECT</codeph>, and 
or other statements that expect to
+          access databases or tables, even if the corresponding databases and 
tables exist.
+        </p>
+
+        <p>
+          The contents of the policy file are cached, to avoid a performance 
penalty for each query. The policy
+          file is re-checked by each <cmdname>impalad</cmdname> node every 5 
minutes. When you make a
+          non-time-sensitive change such as adding new privileges or new 
users, you can let the change take effect
+          automatically a few minutes later. If you remove or reduce 
privileges, and want the change to take effect
+          immediately, restart the <cmdname>impalad</cmdname> daemon on all 
nodes, again specifying the
+          <codeph>-server_name</codeph> and 
<codeph>-authorization_policy_file</codeph> options so that the rules
+          from the updated policy file are applied.
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="security_examples">
+
+      <title>Examples of Policy File Rules for Security Scenarios</title>
+
+      <conbody>
+
+        <p>
+          The following examples show rules that might go in the policy file 
to deal with various
+          authorization-related scenarios. For illustration purposes, this 
section shows several very small policy
+          files with only a few rules each. In your environment, typically you 
would define many roles to cover all
+          the scenarios involving your own databases, tables, and 
applications, and a smaller number of groups,
+          whose members are given the privileges from one or more roles.
+        </p>
+
+        <example id="sec_ex_unprivileged">
+
+          <title>A User with No Privileges</title>
+
+          <p>
+            If a user has no privileges at all, that user cannot access any 
schema objects in the system. The error
+            messages do not disclose the names or existence of objects that 
the user is not authorized to read.
+          </p>
+
+          <p>
+<!--        This example demonstrates the lack of privileges using a blank 
policy file, so no users have any privileges. -->
+            This is the experience you want a user to have if they somehow log 
into a system where they are not an
+            authorized Impala user. In a real deployment with a filled-in 
policy file, a user might have no
+            privileges because they are not a member of any of the relevant 
groups mentioned in the policy file.
+          </p>
+
+<!-- Have the raw material but not formatted into easily digestible example. 
Do for first 1.1 doc refresh.
+<codeblock></codeblock> -->
+
+        </example>
+
+        <example id="sec_ex_superuser">
+
+          <title>Examples of Privileges for Administrative Users</title>
+
+          <p>
+            When an administrative user has broad access to tables or 
databases, the associated rules in the
+            <codeph>[roles]</codeph> section typically use wildcards and/or 
inheritance. For example, in the
+            following sample policy file, <codeph>db=*</codeph> refers to all 
databases and
+            <codeph>db=*-&gt;table=*</codeph> refers to all tables in all 
databases.
+          </p>
+
+          <p>
+            Omitting the rightmost portion of a rule means that the privileges 
apply to all the objects that could
+            be specified there. For example, in the following sample policy 
file, the
+            <codeph>all_databases</codeph> role has all privileges for all 
tables in all databases, while the
+            <codeph>one_database</codeph> role has all privileges for all 
tables in one specific database. The
+            <codeph>all_databases</codeph> role does not grant privileges on 
URIs, so a group with that role could
+            not issue a <codeph>CREATE TABLE</codeph> statement with a 
<codeph>LOCATION</codeph> clause. The
+            <codeph>entire_server</codeph> role has all privileges on both 
databases and URIs within the server.
+          </p>
+
+<codeblock>[groups]
+supergroup = all_databases
+
+[roles]
+read_all_tables = server=server1-&gt;db=*-&gt;table=*-&gt;action=SELECT
+all_tables = server=server1-&gt;db=*-&gt;table=*
+all_databases = server=server1-&gt;db=*
+one_database = server=server1-&gt;db=test_db
+entire_server = server=server1
+</codeblock>
+
+        </example>
+
+        <example id="sec_ex_detailed">
+
+          <title>A User with Privileges for Specific Databases and 
Tables</title>
+
+          <p>
+            If a user has privileges for specific tables in specific 
databases, the user can access those things
+            but nothing else. They can see the tables and their parent 
databases in the output of <codeph>SHOW
+            TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, 
<codeph>USE</codeph> the appropriate databases,
+            and perform the relevant actions (<codeph>SELECT</codeph> and/or 
<codeph>INSERT</codeph>) based on the
+            table privileges. To actually create a table requires the 
<codeph>ALL</codeph> privilege at the
+            database level, so you might define separate roles for the user 
that sets up a schema and other users
+            or applications that perform day-to-day operations on the tables.
+          </p>
+
+          <p>
+            The following sample policy file shows some of the syntax that is 
appropriate as the policy file grows,
+            such as the <codeph>#</codeph> comment syntax, <codeph>\</codeph> 
continuation syntax, and comma
+            separation for roles assigned to groups or privileges assigned to 
roles.
+          </p>
+
+<codeblock>[groups]
+cloudera = training_sysadmin, instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training, \
+server=server1-&gt;db=instructor_private, \
+server=server1-&gt;db=lesson_development
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=instructor_private-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=lesson_development-&gt;table=lesson*
+# This particular course is all about queries, so the students can SELECT but 
not INSERT or CREATE/DROP.
+student = server=server1-&gt;db=training-&gt;table=lesson_*-&gt;action=SELECT
+</codeblock>
+
+        </example>
+
+<!--
+<example id="sec_ex_superuser_single_db">
+<title>A User with Full Privileges for a Specific Database</title>
+<p>
+
+</p>
+<codeblock></codeblock>
+</example>
+
+<example id="sec_ex_readonly_single_db">
+<title>A User with Read-Only Privileges for a Specific Database</title>
+<p>
+
+</p>
+<codeblock></codeblock>
+    <p>
+      If a user has <codeph>SELECT</codeph> privilege for a database, they can 
issue a <codeph>USE</codeph> statement
+      for that database. Whether or not they can access tables within the 
database depends on further privileges
+      defined at the table level.
+    </p>
+
+<codeblock></codeblock>
+
+</example>
+
+<example id="sec_ex_superuser_single_table">
+<title>A User with Full Privileges for a Specific Table</title>
+    <p>
+      If a user has <codeph>SELECT</codeph> privilege for a table, they can 
query, describe, or explain queries for
+      that table.
+    </p>
+
+<codeblock></codeblock>
+</example>
+
+<example id="sec_ex_load_data">
+<title>A User with Privileges to Load Data but not Read Data</title>
+
+    <p>
+      If a user has <codeph>INSERT</codeph> privilege for a table, they can 
write to the table if it already exists.
+      They cannot create or alter the table; those operations require the 
<codeph>ALL</codeph> privilege.
+    </p>
+<codeblock></codeblock>
+</example>
+-->
+
+        <example id="sec_ex_external_files">
+
+          <title>Privileges for Working with External Data Files</title>
+
+          <p>
+            When data is being inserted through the <codeph>LOAD DATA</codeph> 
statement, or is referenced from an
+            HDFS location outside the normal Impala database directories, the 
user also needs appropriate
+            permissions on the URIs corresponding to those HDFS locations.
+          </p>
+
+          <p>
+            In this sample policy file:
+          </p>
+
+          <ul>
+            <li>
+              The <codeph>external_table</codeph> role lets us insert into and 
query the Impala table,
+              <codeph>external_table.sample</codeph>.
+            </li>
+
+            <li>
+              The <codeph>staging_dir</codeph> role lets us specify the HDFS 
path
+              <filepath>/user/cloudera/external_data</filepath> with the 
<codeph>LOAD DATA</codeph> statement.
+              Remember, when Impala queries or loads data files, it operates 
on all the files in that directory,
+              not just a single file, so any Impala <codeph>LOCATION</codeph> 
parameters refer to a directory
+              rather than an individual file.
+            </li>
+
+            <li>
+              We included the IP address and port of the Hadoop name node in 
the HDFS URI of the
+              <codeph>staging_dir</codeph> rule. We found those details in
+              <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the 
<codeph>fs.default.name</codeph>
+              element. That is what we use in any roles that specify URIs 
(that is, the locations of directories in
+              HDFS).
+            </li>
+
+            <li>
+              We start this example after the table 
<codeph>external_table.sample</codeph> is already created. In
+              the policy file for the example, we have already taken away the 
<codeph>external_table_admin</codeph>
+              role from the <codeph>cloudera</codeph> group, and replaced it 
with the lesser-privileged
+              <codeph>external_table</codeph> role.
+            </li>
+
+            <li>
+              We assign privileges to a subdirectory underneath 
<filepath>/user/cloudera</filepath> in HDFS,
+              because such privileges also apply to any subdirectories 
underneath. If we had assigned privileges to
+              the parent directory <filepath>/user/cloudera</filepath>, it 
would be too likely to mess up other
+              files by specifying a wrong location by mistake.
+            </li>
+
+            <li>
+              The <codeph>cloudera</codeph> under the 
<codeph>[groups]</codeph> section refers to the
+              <codeph>cloudera</codeph> group. (In the demo VM used for this 
example, there is a
+              <codeph>cloudera</codeph> user that is a member of a 
<codeph>cloudera</codeph> group.)
+            </li>
+          </ul>
+
+          <p>
+            Policy file:
+          </p>
+
+<codeblock>[groups]
+cloudera = external_table, staging_dir
+
+[roles]
+external_table_admin = server=server1-&gt;db=external_table
+external_table = 
server=server1-&gt;db=external_table-&gt;table=sample-&gt;action=*
+staging_dir = 
server=server1-&gt;uri=hdfs://127.0.0.1:8020/user/cloudera/external_data-&gt;action=*
+</codeblock>
+
+          <p>
+            <cmdname>impala-shell</cmdname> session:
+          </p>
+
+<codeblock>[localhost:21000] &gt; use external_table;
+Query: use external_table
+[localhost:21000] &gt; show tables;
+Query: show tables
+Query finished, fetching results ...
++--------+
+| name   |
++--------+
+| sample |
++--------+
+Returned 1 row(s) in 0.02s
+
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-----+
+| x   |
++-----+
+| 1   |
+| 5   |
+| 150 |
++-----+
+Returned 3 row(s) in 1.04s
+
+[localhost:21000] &gt; load data inpath '/user/cloudera/external_data' into 
table sample;
+Query: load data inpath '/user/cloudera/external_data' into table sample
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-------+
+| x     |
++-------+
+| 2     |
+| 4     |
+| 6     |
+| 8     |
+| 64738 |
+| 49152 |
+| 1     |
+| 5     |
+| 150   |
++-------+
+Returned 9 row(s) in 0.22s
+
+[localhost:21000] &gt; load data inpath '/user/cloudera/unauthorized_data' 
into table sample;
+Query: load data inpath '/user/cloudera/unauthorized_data' into table sample
+ERROR: AuthorizationException: User 'cloudera' does not have privileges to 
access: hdfs://127.0.0.1:8020/user/cloudera/unauthorized_data
+</codeblock>
+
+        </example>
+
+        <example audience="Cloudera" id="sec_ex_views" rev="2.3.0 
collevelauth">
+
+          <title>Controlling Access at the Column Level through Views</title>
+
+          <p>
+            If a user has <codeph>SELECT</codeph> privilege for a view, they 
can query the view, even if they do
+            not have any privileges on the underlying table. To see the 
details about the underlying table through
+            <codeph>EXPLAIN</codeph> or <codeph>DESCRIBE FORMATTED</codeph> 
statements on the view, the user must
+            also have <codeph>SELECT</codeph> privilege for the underlying 
table.
+          </p>
+
+          <note type="important">
+            <p>
+              The types of data that are considered sensitive and confidential 
differ depending on the jurisdiction
+              the type of industry, or both. For fine-grained access controls, 
set up appropriate privileges based
+              on all applicable laws and regulations.
+            </p>
+            <p>
+              Be careful using the <codeph>ALTER VIEW</codeph> statement to 
point an existing view at a different
+              base table or a new set of columns that includes sensitive or 
restricted data. Make sure that any
+              users who have <codeph>SELECT</codeph> privilege on the view do 
not gain access to any additional
+              information they are not authorized to see.
+            </p>
+          </note>
+
+          <p>
+            The following example shows how a system administrator could set 
up a table containing some columns
+            with sensitive information, then create a view that only exposes 
the non-confidential columns.
+          </p>
+
+<codeblock>[localhost:21000] &gt; create table sensitive_info
+                &gt; (
+                &gt;   name string,
+                &gt;   address string,
+                &gt;   credit_card string,
+                &gt;   taxpayer_id string
+                &gt; );
+[localhost:21000] &gt; create view name_address_view as select name, address 
from sensitive_info;
+</codeblock>
+
+          <p>
+            Then the following policy file specifies read-only privilege for 
that view, without authorizing access
+            to the underlying table:
+          </p>
+
+<codeblock>[groups]
+cloudera = view_only_privs
+
+[roles]
+view_only_privs = 
server=server1-&gt;db=reports-&gt;table=name_address_view-&gt;action=SELECT
+</codeblock>
+
+          <p>
+            Thus, a user with the <codeph>view_only_privs</codeph> role could 
access through Impala queries the
+            basic information but not the sensitive information, even if both 
kinds of information were part of the
+            same data file.
+          </p>
+
+          <p>
+            You might define other views to allow users from different groups 
to query different sets of columns.
+          </p>
+
+        </example>
+
+        <example id="sec_sysadmin">
+
+          <title>Separating Administrator Responsibility from Read and Write 
Privileges</title>
+
+          <p>
+            Remember that to create a database requires full privilege on that 
database, while day-to-day
+            operations on tables within that database can be performed with 
lower levels of privilege on specific
+            table. Thus, you might set up separate roles for each database or 
application: an administrative one
+            that could create or drop the database, and a user-level one that 
can access only the relevant tables.
+          </p>
+
+          <p>
+            For example, this policy file divides responsibilities between 
users in 3 different groups:
+          </p>
+
+          <ul>
+            <li>
+              Members of the <codeph>supergroup</codeph> group have the 
<codeph>training_sysadmin</codeph> role and
+              so can set up a database named <codeph>training</codeph>.
+            </li>
+
+            <li>
+              Members of the <codeph>cloudera</codeph> group have the 
<codeph>instructor</codeph> role and so can
+              create, insert into, and query any tables in the 
<codeph>training</codeph> database, but cannot
+              create or drop the database itself.
+            </li>
+
+            <li>
+              Members of the <codeph>visitor</codeph> group have the 
<codeph>student</codeph> role and so can query
+              those tables in the <codeph>training</codeph> database.
+            </li>
+          </ul>
+
+<codeblock>[groups]
+supergroup = training_sysadmin
+cloudera = instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*
+student = server=server1-&gt;db=training-&gt;table=*-&gt;action=SELECT
+</codeblock>
+
+        </example>
+      </conbody>
+    </concept>
+
+    <concept id="security_multiple_policy_files">
+
+      <title>Using Multiple Policy Files for Different Databases</title>
+
+      <conbody>
+
+        <p>
+          For an Impala cluster with many databases being accessed by many 
users and applications, it might be
+          cumbersome to update the security policy file for each privilege 
change or each new database, table, or
+          view. You can allow security to be managed separately for individual 
databases, by setting up a separate
+          policy file for each database:
+        </p>
+
+        <ul>
+          <li>
+            Add the optional <codeph>[databases]</codeph> section to the main 
policy file.
+          </li>
+
+          <li>
+            Add entries in the <codeph>[databases]</codeph> section for each 
database that has its own policy file.
+          </li>
+
+          <li>
+            For each listed database, specify the HDFS path of the appropriate 
policy file.
+          </li>
+        </ul>
+
+        <p>
+          For example:
+        </p>
+
+<codeblock>[databases]
+# Defines the location of the per-DB policy files for the 'customers' and 
'sales' databases.
+customers = hdfs://ha-nn-uri/etc/access/customers.ini
+sales = hdfs://ha-nn-uri/etc/access/sales.ini
+</codeblock>
+
+        <p>
+          To enable URIs in per-DB policy files, add the following string in 
the Cloudera Manager field
+          <uicontrol>Impala Service Environment Advanced Configuration Snippet 
(Safety Valve)</uicontrol>:
+        </p>
+
+<codeblock>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
+</codeblock>
+
+        <note type="important">
+          Enabling URIs in per-DB policy files introduces a security risk by 
allowing the owner of the db-level
+          policy file to grant himself/herself load privileges to anything the 
<codeph>impala</codeph> user has
+          read permissions for in HDFS (including data in other databases 
controlled by different db-level policy
+          files).
+        </note>
+      </conbody>
+    </concept>
+  </concept>
+
+  <concept id="security_schema">
+
+    <title>Setting Up Schema Objects for a Secure Impala Deployment</title>
+
+    <conbody>
+
+      <p>
+        Remember that in your role definitions, you specify privileges at the 
level of individual databases and
+        tables, or all databases or all tables within a database. To simplify 
the structure of these rules, plan
+        ahead of time how to name your schema objects so that data with 
different authorization requirements is
+        divided into separate databases.
+      </p>
+
+      <p>
+        If you are adding security on top of an existing Impala deployment, 
remember that you can rename tables or
+        even move them between databases using the <codeph>ALTER 
TABLE</codeph> statement. In Impala, creating new
+        databases is a relatively inexpensive operation, basically just 
creating a new directory in HDFS.
+      </p>
+
+      <p>
+        You can also plan the security scheme and set up the policy file 
before the actual schema objects named in
+        the policy file exist. Because the authorization capability is based 
on whitelisting, a user can only
+        create a new database or table if the required privilege is already in 
the policy file: either by listing
+        the exact name of the object being created, or a <codeph>*</codeph> 
wildcard to match all the applicable
+        objects within the appropriate container.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="security_privileges">
+
+    <title>Privilege Model and Object Hierarchy</title>
+
+    <conbody>
+
+      <p>
+        Privileges can be granted on different objects in the schema. Any 
privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is 
granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the 
same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p>
+        The kinds of objects in the schema hierarchy are:
+      </p>
+
+<codeblock>Server
+URI
+Database
+  Table
+</codeblock>
+
+      <p>
+        The server name is specified by the <codeph>-server_name</codeph> 
option when <cmdname>impalad</cmdname>
+        starts. Specify the same name for all <cmdname>impalad</cmdname> nodes 
in the cluster.
+      </p>
+
+      <p>
+        URIs represent the HDFS paths you specify as part of statements such 
as <codeph>CREATE EXTERNAL
+        TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify 
what look like UNIX paths, but these
+        locations can also be prefixed with <codeph>hdfs://</codeph> to make 
clear that they are really URIs. To
+        set privileges for a URI, specify the name of a directory, and the 
privilege applies to all the files in
+        that directory and any directories underneath it.
+      </p>
+
+      <p rev="2.3.0 collevelauth">
+        In <keyword keyref="impala23_full"/> and higher, you can specify 
privileges for individual columns,
+        as described in
+        <xref audience="integrated" 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref 
audience="standalone" 
href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html";
 format="html" scope="external"/>.
+        Formerly, to specify
+        read privileges at this level, you created a view that queried 
specific columns and/or partitions from a base
+        table, and gave <codeph>SELECT</codeph> privilege on the view but not 
the underlying table.
+      </p>
+
+      <p>
+        URIs must start with either <codeph>hdfs://</codeph> or 
<codeph>file://</codeph>. If a URI starts with
+        anything else, it will cause an exception and the policy file will be 
invalid. When defining URIs for HDFS,
+        you must also specify the NameNode. For example:
+<codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir, \
+server=server1-&gt;uri=hdfs://namenode:port/path/to/dir
+</codeblock>
+        <note type="warning">
+          <p>
+            Because the NameNode host and port must be specified, Cloudera 
strongly recommends you use High
+            Availability (HA). This ensures that the URI will remain constant 
even if the namenode changes.
+          </p>
+<codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir,\ 
server=server1-&gt;uri=hdfs://ha-nn-uri/path/to/dir
+</codeblock>
+        </note>
+      </p>
+
+<!-- Experiment with replacing my original copied table with a conref'ed 
version of Ambreen's from the Security Guide. -->
+
+<!--
+          <table>
+          <title>Sentry privilege types and objects they apply to</title>
+          <tgroup cols="2">
+              <colspec colnum="1" colname="col1"/>
+              <colspec colnum="2" colname="col2"/>
+              <tbody>
+                  <row>
+                      <entry>Privilege</entry>
+                      <entry>Object</entry>
+                  </row>
+                  <row>
+                      <entry>INSERT</entry>
+                      <entry>TABLE, URI</entry>
+                  </row>
+                  <row>
+                      <entry>SELECT</entry>
+                      <entry>TABLE, VIEW, URI</entry>
+                  </row>
+                  <row>
+                      <entry>ALL</entry>
+                      <entry>SERVER, DB, URI</entry>
+                  </row>
+              </tbody>
+          </tgroup>
+          </table>
+-->
+
+      <table 
conref="../shared/impala_common.xml#common/sentry_privileges_objects">
+        <tgroup cols="2">
+          <colspec colnum="1" colname="col1" colwidth="1*"/>
+          <tbody>
+            <row>
+              <entry/>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+      <note>
+        <p>
+          Although this document refers to the <codeph>ALL</codeph> privilege, 
currently if you use the policy file
+          mode, you do not use the actual keyword <codeph>ALL</codeph> in the 
policy file. When you code role
+          entries in the policy file:
+        </p>
+        <ul>
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a server, use a 
role like
+            <codeph>server=<varname>server_name</varname></codeph>.
+          </li>
+
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a database, use 
a role like
+            
<codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname></codeph>.
+          </li>
+
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a table, use a 
role like
+            
<codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=*</codeph>.
+          </li>
+        </ul>
+      </note>
+
+      <draft-comment author="ambreen.kazi" translate="no">The Privilege Model 
lives at:
+https://wiki.cloudera.com/pages/viewpage.action?pageId=24919851</draft-comment>
+
+      <table>
+        <tgroup cols="4">
+          <colspec colnum="1" colname="col1" colwidth="1.31*"/>
+          <colspec colnum="2" colname="col2" colwidth="1.17*"/>
+          <colspec colnum="3" colname="col3" colwidth="1*"/>
+          <colspec colname="newCol4" colnum="4" colwidth="1*"/>
+          <thead>
+            <row>
+              <entry>
+                Operation
+              </entry>
+              <entry>
+                Scope
+              </entry>
+              <entry>
+                Privileges
+              </entry>
+              <entry>
+                URI
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row conref="../shared/impala_common.xml#common/explain_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/load_data_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/create_database_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/drop_database_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/create_table_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/drop_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/describe_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_add_columns_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_replace_columns_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_change_column_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_rename_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_set_tblproperties_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_set_fileformat_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_set_location_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_add_partition_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_add_partition_location_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_drop_partition_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_partition_set_fileformat_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_set_serdeproperties_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/create_view_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/drop_view_privs">
+              <entry/>
+            </row>
+            <row id="alter_view_privs">
+              <entry>
+                ALTER VIEW
+              </entry>
+              <entry rev="2.3.0 collevelauth">
+                You need <codeph>ALL</codeph> privilege on the named view <ph 
rev="1.4.0">and the parent
+                database</ph>, plus <codeph>SELECT</codeph> privilege for any 
tables or views referenced by the
+                view query. Once the view is created or altered by a 
high-privileged system administrator, it can
+                be queried by a lower-privileged user who does not have full 
query privileges for the base tables.
+              </entry>
+              <entry>
+                ALL, SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/alter_table_set_location_privs">
+              <entry/>
+            </row>
+            <row id="create_external_table_privs">
+              <entry>
+                CREATE EXTERNAL TABLE
+              </entry>
+              <entry>
+                Database (ALL), URI (SELECT)
+              </entry>
+              <entry>
+                ALL, SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/select_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/use_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/create_function_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/drop_function_privs">
+              <entry/>
+            </row>
+            <row conref="../shared/impala_common.xml#common/refresh_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/invalidate_metadata_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/invalidate_metadata_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="../shared/impala_common.xml#common/compute_stats_privs">
+              <entry/>
+            </row>
+            <row id="show_table_stats_privs">
+              <entry>
+                SHOW TABLE STATS, SHOW PARTITIONS
+              </entry>
+              <entry>
+                TABLE
+              </entry>
+              <entry>
+                SELECT/INSERT
+              </entry>
+              <entry/>
+            </row>
+            <row>
+              <entry id="show_column_stats_privs">
+                SHOW COLUMN STATS
+              </entry>
+              <entry>
+                TABLE
+              </entry>
+              <entry>
+                SELECT/INSERT
+              </entry>
+              <entry/>
+            </row>
+            <row>
+              <entry id="show_functions_privs">
+                SHOW FUNCTIONS
+              </entry>
+              <entry>
+                DATABASE
+              </entry>
+              <entry>
+                SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row id="show_tables_privs">
+              <entry>
+                SHOW TABLES
+              </entry>
+              <entry/>
+              <entry>
+                No special privileges needed to issue the statement, but only 
shows objects you are authorized for
+              </entry>
+              <entry/>
+            </row>
+            <row id="show_databases_privs">
+              <entry>
+                SHOW DATABASES, SHOW SCHEMAS
+              </entry>
+              <entry/>
+              <entry>
+                No special privileges needed to issue the statement, but only 
shows objects you are authorized for
+              </entry>
+              <entry/>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+<!-- Original table, now replaced by content conref'ed from Security Guide
+<table>
+<title>Sentry privileges for Impala SQL operations</title>
+          <tgroup cols="3">
+              <colspec colnum="1" colname="col1" colwidth="*"/>
+              <colspec colnum="2" colname="col2" colwidth="*"/>
+              <colspec colnum="3" colname="col3" colwidth="4*"/>
+<thead>
+<row>
+<entry>Impala SQL Operation</entry>
+<entry>Privileges Required</entry>
+<entry>Object on which Privileges Required</entry>
+</row>
+</thead>
+<tbody>
+<row>
+<entry>EXPLAIN</entry>
+<entry>SELECT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>LOAD DATA</entry>
+<entry>INSERT, SELECT</entry>
+<entry>Table (INSERT) and URI (ALL); write privilege is required on the URI 
because the data files are physically moved from there</entry>
+</row>
+<row>
+<entry>CREATE DATABASE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Server</ph></entry>
+</row>
+<row>
+<entry>DROP DATABASE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Server</ph></entry>
+</row>
+<row>
+<entry>CREATE TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>DROP TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>DESCRIBE TABLE</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>ALTER TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>SHOW DATABASES</entry>
+<entry>Any privilege</entry>
+<entry>Any object in the database; only databases for which the user has 
privileges on some object are shown</entry>
+</row>
+<row>
+<entry>SHOW TABLES</entry>
+<entry>Any privilege</entry>
+<entry>Any table in the database; only tables for which the user has 
privileges are shown</entry>
+</row>
+<row>
+<entry>CREATE VIEW</entry>
+<entry>ALL, SELECT</entry>
+<entry rev="2.3.0 collevelauth">
+You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and 
the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables 
or views referenced by the view query.
+Once the view is created or altered by a high-privileged system administrator,
+it can be queried by a lower-privileged user who does not have full query 
privileges for the base tables.
+</entry>
+</row>
+<row>
+<entry>DROP VIEW</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>ALTER VIEW</entry>
+<entry>ALL, SELECT</entry>
+<entry rev="2.3.0 collevelauth">
+You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and 
the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables 
or views referenced by the view query.
+Once the view is created or altered by a high-privileged system administrator,
+it can be queried by a lower-privileged user who does not have full query 
privileges for the base tables.
+</entry>
+</row>
+<row>
+<entry>ALTER TABLE LOCATION</entry>
+<entry>ALL</entry>
+<entry>Table, URI</entry>
+</row>
+<row>
+<entry>CREATE EXTERNAL TABLE</entry>
+<entry>ALL, SELECT</entry>
+<entry>Database (ALL), URI (SELECT)</entry>
+</row>
+<row>
+<entry>SELECT</entry>
+<entry>SELECT</entry>
+<entry rev="2.3.0 collevelauth">Table, View; you can have SELECT privilege for 
a view without having SELECT privilege
+for the underlying tables. See
+<xref audience="integrated" 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/><xref 
audience="standalone" 
href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_sql.html";
 format="html" scope="external"/>
+for details about column-level authorization.
+Before column-level authorization was added in <keyword 
keyref="impala23_full"/> the way to implement
+column-level security was by creating views that referenced particular sets of 
columns.</entry>
+</row>
+<row>
+<entry>USE</entry>
+<entry>Any privilege</entry>
+<entry>Any object in the database</entry>
+</row>
+<row rev="1.2">
+<entry>CREATE FUNCTION</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row rev="1.2">
+<entry>DROP FUNCTION</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row>
+<entry>REFRESH</entry>
+<entry>ALL</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>INVALIDATE METADATA</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row rev="1.2.2">
+<entry>COMPUTE STATS</entry>
+<entry>ALL</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE STATS, SHOW PARTITIONS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW COLUMN STATS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW FUNCTIONS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW DATABASES, SHOW SCHEMAS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+</tbody>
+</tgroup>
+</table>
+-->
+
+<!-- Here are pieces from the original Hive table that do not apply to Impala 
because of differences in the SQL syntax.
+   Might resurrect some of these later as features are added to Impala, in 
particular CREATE TABLE AS SELECT.
+<row>
+<entry>EXPORT</entry>
+<entry>SELECT, INSERT@URI</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>IMPORT</entry>
+<entry>ALL, SELECT@URI</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ANALYZE</entry>
+<entry>INSERT, SELECT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SWITCH DATABASE</entry>
+<entry>Any</entry>
+<entry>Any Table, View in the DB</entry>
+</row>
+<row>
+<entry>DESCRIBE FUNCTION&#42;</entry>
+<entry>Not Restricted</entry>
+</row>
+<row>
+<entry>MSCK</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>SHOW COLUMNS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE STATUS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE PROPERTIES</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW CREATE TABLE</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW FUNCTIONS&#42;</entry>
+<entry>Not restricted</entry>
+</row>
+<row>
+<entry>SHOW PARTITIONS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW INDEXES</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW LOCKS</entry>
+<entry>Not Restricted</entry>
+</row>
+<row>
+<entry>CREATE FUNCTION</entry>
+<entry>Any Privilege</entry>
+<entry>Any Object</entry>
+</row>
+<row>
+<entry>CREATE INDEX</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>DROP FUNCTION</entry>
+<entry>Any Privilege</entry>
+<entry>Any Object</entry>
+</row>
+<row>
+<entry>DROP INDEX</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER INDEX REBUILD</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER VIEW PROPERTIES</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>GRANT PRIVILEGE</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>REVOKE PRIVILEGE</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>SHOW GRANTS</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>ALTER TABLE PROTECT MODE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE FILE FORMAT</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION PROTECT MODE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION FILE FORMAT</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION LOCATION&#42;</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row>
+<entry>CREATE TABLE AS SELECT</entry>
+<entry>ALL, SELECT</entry>
+<entry>Database, Table/View</entry>
+</row>
+<row>
+<entry>ALTER INDEX PROP</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER DATABASE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>DESCRIBE DATABASE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE MERGE FILE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION MERGE FILE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE SKEWED</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE PARTITION SKEWED LOCATION</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ADD JAR&#42;</entry>
+<entry>Restricted unless hive.server2.authorization.external.exec = 
true</entry>
+</row>
+<row>
+<entry>TRANSFORM&#42;</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+-->
+    </conbody>
+  </concept>
+
+  <concept id="sentry_debug">
+
+    <title><ph 
conref="../shared/impala_common.xml#common/title_sentry_debug"/></title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/sentry_debug"/>
+    </conbody>
+  </concept>
+
+  <concept id="sentry_cm">
+
+    <title>Managing Sentry for Impala through Cloudera Manager</title>
+
+    <conbody>
+
+      <p>
+        To enable the Sentry service for Impala and Hive, set 
<uicontrol>Hive/Impala &gt; Service-Wide &gt; Sentry
+        Service</uicontrol> parameter to the Sentry service. Then restart 
Impala and Hive. Simply adding Sentry
+        service as a dependency and restarting enables Impala and Hive to use 
the Sentry service.
+      </p>
+
+      <p>
+        To set the server name to use when granting server level privileges, 
set the <uicontrol>Hive &gt;
+        Service-Wide &gt; Advanced &gt; Server Name for Sentry 
Authorization</uicontrol> parameter. When using
+        Sentry with the Hive Metastore, you can specify the list of users that 
are allowed to bypass Sentry
+        Authorization in Hive Metastore using <uicontrol>Hive &gt; 
Service-Wide &gt; Security &gt; Bypass Sentry
+        Authorization Users</uicontrol>. These are usually service users that 
already ensure all activity has been
+        authorized.
+      </p>
+
+      <note>
+        The <uicontrol>Hive/Impala &gt; Service-Wide &gt; Policy File Based 
Sentry</uicontrol> tab contains
+        parameters only relevant to configuring Sentry using policy files. In 
particular, make sure that
+        <uicontrol>Enable Sentry Authorization using Policy Files</uicontrol> 
parameter is unchecked when using the
+        Sentry service. Cloudera Manager throws a validation error if you 
attempt to configure the Sentry service
+        and policy file at the same time.
+      </note>
+    </conbody>
+  </concept>
+
+  <concept id="sec_ex_default">
+
+    <title>The DEFAULT Database in a Secure Deployment</title>
+
+    <conbody>
+
+      <p>
+        Because of the extra emphasis on granular access controls in a secure 
deployment, you should move any
+        important or sensitive information out of the <codeph>DEFAULT</codeph> 
database into a named database whose
+        privileges are specified in the policy file. Sometimes you might need 
to give privileges on the
+        <codeph>DEFAULT</codeph> database for administrative reasons; for 
example, as a place you can reliably
+        specify with a <codeph>USE</codeph> statement when preparing to drop a 
database.
+      </p>
+
+<!-- Maybe have an example later, but not for initial 1.1 release.
+<codeblock></codeblock>
+-->
+    </conbody>
+  </concept>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_avg.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_avg.xml b/docs/topics/impala_avg.xml
new file mode 100644
index 0000000..d95b2d5
--- /dev/null
+++ b/docs/topics/impala_avg.xml
@@ -0,0 +1,225 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="avg">
+
+  <title>AVG Function</title>
+  <titlealts audience="PDF"><navtitle>AVG</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="Analytic Functions"/>
+      <data name="Category" value="Aggregate Functions"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">avg() function</indexterm>
+      An aggregate function that returns the average value from a set of 
numbers or <codeph>TIMESTAMP</codeph> values.
+      Its single argument can be numeric column, or the numeric result of a 
function or expression applied to the
+      column value. Rows with a <codeph>NULL</codeph> value for the specified 
column are ignored. If the table is empty,
+      or all the values supplied to <codeph>AVG</codeph> are 
<codeph>NULL</codeph>, <codeph>AVG</codeph> returns
+      <codeph>NULL</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>AVG([DISTINCT | ALL] <varname>expression</varname>) [OVER 
(<varname>analytic_clause</varname>)]
+</codeblock>
+
+    <p>
+      When the query contains a <codeph>GROUP BY</codeph> clause, returns one 
value for each combination of
+      grouping values.
+    </p>
+
+    <p>
+      <b>Return type:</b> <codeph>DOUBLE</codeph> for numeric values; 
<codeph>TIMESTAMP</codeph> for
+      <codeph>TIMESTAMP</codeph> values
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
+
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>-- Average all the non-NULL values in a column.
+insert overwrite avg_t values (2),(4),(6),(null),(null);
+-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are 
ignored.
+select avg(x) from avg_t;
+-- Average only certain values from the column.
+select avg(x) from t1 where month = 'January' and year = '2013';
+-- Apply a calculation to the value of the column before averaging.
+select avg(x/3) from t1;
+-- Apply a function to the value of the column before averaging.
+-- Here we are substituting a value of 0 for all NULLs in the column,
+-- so that those rows do factor into the return value.
+select avg(isnull(x,0)) from t1;
+-- Apply some number-returning function to a string column and average the 
results.
+-- If column s contains any NULLs, length(s) also returns NULL and those rows 
are ignored.
+select avg(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, avg(page_visits) from web_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select avg(distinct x) from t1;
+-- Filter the output after performing the calculation.
+select avg(x) from t1 group by y having avg(x) between 1 and 20;
+</codeblock>
+
+    <p rev="2.0.0">
+      The following examples show how to use <codeph>AVG()</codeph> in an 
analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <codeph>AVG()</codeph> 
is reported for each input value, as
+      opposed to the <codeph>GROUP BY</codeph> clause which condenses the 
result set.
+<codeblock>select x, property, avg(x) over (partition by property) as avg from 
int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | avg |
++----+----------+-----+
+| 2  | even     | 6   |
+| 4  | even     | 6   |
+| 6  | even     | 6   |
+| 8  | even     | 6   |
+| 10 | even     | 6   |
+| 1  | odd      | 5   |
+| 3  | odd      | 5   |
+| 5  | odd      | 5   |
+| 7  | odd      | 5   |
+| 9  | odd      | 5   |
++----+----------+-----+
+</codeblock>
+
+Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results 
that are cumulative or apply to a moving
+set of rows (the <q>window</q>). The following examples use 
<codeph>AVG()</codeph> in an analytic context
+(that is, with an <codeph>OVER()</codeph> clause) to produce a running average 
of all the even values,
+then a running average of all the odd values. The basic <codeph>ORDER BY 
x</codeph> clause implicitly
+activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+therefore all of these examples produce the same results:
+<codeblock>select x, property,
+  avg(x) over (partition by property <b>order by x</b>) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between unbounded preceding and current row</b>
+  ) as 'cumulative average'
+from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between unbounded preceding and current row</b>
+  ) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+</codeblock>
+
+The following examples show how to construct a moving window, with a running 
average taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even 
values or all the odd values).
+Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this 
type of
+moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but 
not the <codeph>RANGE BETWEEN</codeph>
+clause:
+<codeblock>select x, property,
+  avg(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between 1 preceding and 1 following</b>
+  ) as 'moving average'
+  from int_t where property in ('odd','even');
++----+----------+----------------+
+| x  | property | moving average |
++----+----------+----------------+
+| 2  | even     | 3              |
+| 4  | even     | 4              |
+| 6  | even     | 6              |
+| 8  | even     | 8              |
+| 10 | even     | 9              |
+| 1  | odd      | 2              |
+| 3  | odd      | 3              |
+| 5  | odd      | 5              |
+| 7  | odd      | 7              |
+| 9  | odd      | 8              |
++----+----------+----------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between 1 preceding and 1 following</b>
+  ) as 'moving average'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and 
upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</codeblock>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+<!-- This conref appears under SUM(), AVG(), FLOAT, and DOUBLE topics. -->
+
+    <p conref="../shared/impala_common.xml#common/sum_double"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_analytic_functions.xml#analytic_functions"/>, <xref 
href="impala_max.xml#max"/>,
+      <xref href="impala_min.xml#min"/>
+    </p>
+  </conbody>
+</concept>

[43/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to