5.10 versions.

jrussell Tue, 01 Nov 2016 16:14:43 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_authorization.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_authorization.xml 
b/docs/topics/impala_authorization.xml
index 12cdb98..4b009ce 100644
--- a/docs/topics/impala_authorization.xml
+++ b/docs/topics/impala_authorization.xml
@@ -3,7 +3,18 @@
 <concept rev="1.1" id="authorization">
 
   <title>Enabling Sentry Authorization for Impala</title>
-  
+  <prolog>
+    <metadata>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Starting and Stopping"/>
+      <data name="Category" value="Users"/>
+      <data name="Category" value="Groups"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
 
   <conbody id="sentry">
 
@@ -17,7 +28,1591 @@
       other client program, and associates various privileges with each user.
     </p>
 
-    
+    <note>
+      Sentry is typically used in conjunction with Kerberos authentication, 
which defines which hosts are allowed
+      to connect to each server. Using the combination of Sentry and Kerberos 
prevents malicious users from being
+      able to connect by creating a named account on an untrusted machine. See
+      <xref href="impala_kerberos.xml#kerberos"/> for details about Kerberos 
authentication.
+    </note>
+
+    <p audience="PDF" outputclass="toc inpage">
+      See the following sections for details about using the Impala 
authorization features:
+    </p>
+  </conbody>
+
+  <concept id="sentry_priv_model">
+
+    <title>The Sentry Privilege Model</title>
+
+    <conbody>
+
+      <p>
+        Privileges can be granted on different objects in the schema. Any 
privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is 
granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the 
same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p rev="2.3.0 collevelauth">
+        The object hierarchy for Impala covers Server, URI, Database, Table, 
and Column. (The Table privileges apply to views as well;
+        anywhere you specify a table name, you can specify a view name 
instead.)
+        Column-level authorization is available in CDH 5.5 / Impala 2.3 and 
higher, as described in <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/>;
+        previously, you constructed views to query specific columns and 
assigned privileges
+        based on the views rather than the base tables.
+      </p>
+
+      <p>
+        A restricted set of privileges determines what you can do with each 
object:
+      </p>
+
+      <dl>
+        <dlentry id="select_priv">
+
+          <dt>
+            SELECT privilege
+          </dt>
+
+          <dd>
+            Lets you read data from a table or view, for example with the 
<codeph>SELECT</codeph> statement, the
+            <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE 
TABLE...LIKE</codeph>. Also required to
+            issue the <codeph>DESCRIBE</codeph> statement or the 
<codeph>EXPLAIN</codeph> statement for a query
+            against a particular table. Only objects for which a user has this 
privilege are shown in the output
+            for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW 
TABLES</codeph> statements. The
+            <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE 
METADATA</codeph> statements only access
+            metadata for tables for which the user has this privilege.
+          </dd>
+
+        </dlentry>
+
+        <dlentry id="insert_priv">
+
+          <dt>
+            INSERT privilege
+          </dt>
+
+          <dd>
+            Lets you write data to a table. Applies to the 
<codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph>
+            statements.
+          </dd>
+
+        </dlentry>
+
+        <dlentry id="all_priv">
+
+          <dt>
+            ALL privilege
+          </dt>
+
+          <dd>
+            Lets you create or modify the object. Required to run DDL 
statements such as <codeph>CREATE
+            TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP 
TABLE</codeph> for a table,
+            <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> 
for a database, or <codeph>CREATE
+            VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP 
VIEW</codeph> for a view. Also required for
+            the URI of the <q>location</q> parameter for the <codeph>CREATE 
EXTERNAL TABLE</codeph> and
+            <codeph>LOAD DATA</codeph> statements.
+<!-- Have to think about the best wording, how often to repeat, how best to 
conref this caveat.
+          You do not actually code the keyword <codeph>ALL</codeph> in the 
policy file; instead you use
+          <codeph>action=*</codeph> or shorten the right-hand portion of the 
rule.
+          -->
+          </dd>
+
+        </dlentry>
+      </dl>
+
+      <p>
+        Privileges can be specified for a table or view before that object 
actually exists. If you do not have
+        sufficient privilege to perform an operation, the error message does 
not disclose if the object exists or
+        not.
+      </p>
+
+      <p>
+        Originally, privileges were encoded in a policy file, stored in HDFS. 
This mode of operation is still an
+        option, but the emphasis of privilege management is moving towards 
being SQL-based. Although currently
+        Impala does not have <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> 
statements, Impala can make use of
+        privileges assigned through <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements done through
+        Hive. The mode of operation with <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements instead of
+        the policy file requires that a special Sentry service be enabled; 
this service stores, retrieves, and
+        manipulates privilege information stored inside the metastore database.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="secure_startup">
+
+    <title>Starting the impalad Daemon with Sentry Authorization 
Enabled</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Starting and Stopping"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        To run the <cmdname>impalad</cmdname> daemon with authorization 
enabled, you add one or more options to the
+        <codeph>IMPALA_SERVER_ARGS</codeph> declaration in the 
<filepath>/etc/default/impala</filepath>
+        configuration file:
+      </p>
+
+      <ul>
+        <li>
+          The <codeph>-server_name</codeph> option turns on Sentry 
authorization for Impala. The authorization
+          rules refer to a symbolic server name, and you specify the name to 
use as the argument to the
+          <codeph>-server_name</codeph> option.
+        </li>
+
+        <li rev="1.4.0">
+          If you specify just <codeph>-server_name</codeph>, Impala uses the 
Sentry service for authorization,
+          relying on the results of <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements issued through
+          Hive. (This mode of operation is available in Impala 1.4.0 and 
higher.) Prior to Impala 1.4.0, or if you
+          want to continue storing privilege rules in the policy file, also 
specify the
+          <codeph>-authorization_policy_file</codeph> option as in the 
following item.
+        </li>
+
+        <li>
+          Specifying the <codeph>-authorization_policy_file</codeph> option in 
addition to
+          <codeph>-server_name</codeph> makes Impala read privilege 
information from a policy file, rather than
+          from the metastore database. The argument to the 
<codeph>-authorization_policy_file</codeph> option
+          specifies the HDFS path to the policy file that defines the 
privileges on different schema objects.
+        </li>
+      </ul>
+
+      <p rev="1.4.0">
+        For example, you might adapt your 
<filepath>/etc/default/impala</filepath> configuration to contain lines
+        like the following. To use the Sentry service rather than the policy 
file:
+      </p>
+
+<codeblock rev="1.4.0">IMPALA_SERVER_ARGS=" \
+-server_name=server1 \
+...
+</codeblock>
+
+      <p>
+        Or to use the policy file, as in releases prior to Impala 1.4:
+      </p>
+
+<codeblock>IMPALA_SERVER_ARGS=" \
+-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
+-server_name=server1 \
+...
+</codeblock>
+
+      <p>
+        The preceding examples set up a symbolic name of 
<codeph>server1</codeph> to refer to the current instance
+        of Impala. This symbolic name is used in the following ways:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            In an environment managed by Cloudera Manager, the server name is 
specified through
+            <menucascade><uicontrol>Impala 
(Service-Wide)</uicontrol><uicontrol>Category</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry
 Service</uicontrol></menucascade> and
+            
<menucascade><uicontrol>Hive</uicontrol><uicontrol>Service-Wide</uicontrol><uicontrol>Advanced</uicontrol><uicontrol>Sentry
 Service</uicontrol></menucascade>.
+            The values must be the same for both, so that Impala and Hive can 
share the privilege rules. Restart
+            the Impala and Hive services after setting or changing this value.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            In an environment not managed by Cloudera Manager, you specify 
this value for the
+            <codeph>sentry.hive.server</codeph> property in the 
<filepath>sentry-site.xml</filepath> configuration
+            file for Hive, as well as in the <codeph>-server_name</codeph> 
option for <cmdname>impalad</cmdname>.
+          </p>
+          <p>
+            If the <cmdname>impalad</cmdname> daemon is not already running, 
start it as described in
+            <xref href="impala_processes.xml#processes"/>. If it is already 
running, restart it with the command
+            <codeph>sudo /etc/init.d/impala-server restart</codeph>. Run the 
appropriate commands on all the nodes
+            where <cmdname>impalad</cmdname> normally runs.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If you use the mode of operation using the policy file, the rules 
in the <codeph>[roles]</codeph>
+            section of the policy file refer to this same 
<codeph>server1</codeph> name. For example, the following
+            rule sets up a role <codeph>report_generator</codeph> that lets 
users with that role query any table in
+            a database named <codeph>reporting_db</codeph> on a node where the 
<cmdname>impalad</cmdname> daemon
+            was started up with the <codeph>-server_name=server1</codeph> 
option:
+          </p>
+<codeblock>[roles]
+report_generator = 
server=server1-&gt;db=reporting_db-&gt;table=*-&gt;action=SELECT
+</codeblock>
+        </li>
+      </ul>
+
+      <p>
+        When <cmdname>impalad</cmdname> is started with one or both of the 
<codeph>-server_name=server1</codeph>
+        and <codeph>-authorization_policy_file</codeph> options, Impala 
authorization is enabled. If Impala detects
+        any errors or inconsistencies in the authorization settings or the 
policy file, the daemon refuses to
+        start.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="sentry_service">
+
+    <title>Using Impala with the Sentry Service (CDH 5.1 or higher 
only)</title>
+
+    <conbody>
+
+      <p>
+        When you use the Sentry service rather than the policy file, you set 
up privileges through
+        <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statement in either 
Impala or Hive, then both components
+        use those same privileges automatically. (Impala added the 
<codeph>GRANT</codeph> and
+        <codeph>REVOKE</codeph> statements in Impala 2.0.0 / CDH 5.2.0.)
+      </p>
+
+      <p>
+        Hive already had <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements prior to CDH 5.1, but those
+        statements were not production-ready. CDH 5.1 is the first release 
where those statements use the Sentry
+        framework and are considered GA level. If you used the Hive 
<codeph>GRANT</codeph> and
+        <codeph>REVOKE</codeph> statements prior to CDH 5.1, you must set up 
these privileges with the CDH 5.1
+        versions of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> to take 
advantage of Sentry authorization.
+      </p>
+
+      <p>
+        For information about using the updated Hive <codeph>GRANT</codeph> 
and <codeph>REVOKE</codeph> statements,
+        see
+<!-- Original URL: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Security-Guide/cdh_sg_sentry_service.html
 -->
+        <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_sentry_service.html";
 scope="external" format="html">Sentry
+        service</xref> topic in the <cite>CDH 5 Security Guide</cite>.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="security_policy_file">
+
+    <title>Using Impala with the Sentry Policy File</title>
+
+    <conbody>
+
+      <p>
+        The policy file is a file that you put in a designated location in 
HDFS, and is read during the startup of
+        the <cmdname>impalad</cmdname> daemon when you specify both the 
<codeph>-server_name</codeph> and
+        <codeph>-authorization_policy_file</codeph> startup options. It 
controls which objects (databases, tables,
+        and HDFS directory paths) can be accessed by the user who connects to 
<cmdname>impalad</cmdname>, and what
+        operations that user can perform on the objects.
+      </p>
+
+      <note rev="1.4.0">
+        <p rev="1.4.0">
+          In CDH 5 and higher, Cloudera recommends
+          managing privileges through SQL statements, as described in
+          <xref href="impala_authorization.xml#sentry_service"/>. If you are 
still using policy files, plan to
+          migrate to the new approach some time in the future.
+        </p>
+      </note>
+
+      <p>
+        The location of the policy file is listed in the 
<filepath>auth-site.xml</filepath> configuration file. To
+        minimize overhead, the security information from this file is cached 
by each <cmdname>impalad</cmdname>
+        daemon and refreshed automatically, with a default interval of 5 
minutes. After making a substantial change
+        to security policies, restart all Impala daemons to pick up the 
changes immediately.
+      </p>
+
+      <p outputclass="toc inpage"/>
+    </conbody>
+
+    <concept id="security_policy_file_details">
+
+      <title>Policy File Location and Format</title>
+
+      <conbody>
+
+        <p>
+          The policy file uses the familiar <codeph>.ini</codeph> format, 
divided into the major sections
+          <codeph>[groups]</codeph> and <codeph>[roles]</codeph>. There is 
also an optional
+          <codeph>[databases]</codeph> section, which allows you to specify a 
specific policy file for a particular
+          database, as explained in <xref 
href="#security_multiple_policy_files"/>. Another optional section,
+          <codeph>[users]</codeph>, allows you to override the OS-level 
mapping of users to groups; that is an
+          advanced technique primarily for testing and debugging, and is 
beyond the scope of this document.
+        </p>
+
+        <p>
+          In the <codeph>[groups]</codeph> section, you define various 
categories of users and select which roles
+          are associated with each category. The group and usernames 
correspond to Linux groups and users on the
+          server where the <cmdname>impalad</cmdname> daemon runs.
+        </p>
+
+        <p>
+          The group and usernames in the <codeph>[groups]</codeph> section 
correspond to Linux groups and users on
+          the server where the <cmdname>impalad</cmdname> daemon runs. When 
you access Impala through the
+          <cmdname>impalad</cmdname> interpreter, for purposes of 
authorization, the user is the logged-in Linux
+          user and the groups are the Linux groups that user is a member of. 
When you access Impala through the
+          ODBC or JDBC interfaces, the user and password specified through the 
connection string are used as login
+          credentials for the Linux server, and authorization is based on that 
username and the associated Linux
+          group membership.
+        </p>
+
+        <p>
+          In the <codeph>[roles]</codeph> section, you a set of roles. For 
each role, you specify precisely the set
+          of privileges is available. That is, which objects users with that 
role can access, and what operations
+          they can perform on those objects. This is the lowest-level category 
of security information; the other
+          sections in the policy file map the privileges to higher-level 
divisions of groups and users. In the
+          <codeph>[groups]</codeph> section, you specify which roles are 
associated with which groups. The group
+          and usernames correspond to Linux groups and users on the server 
where the <cmdname>impalad</cmdname>
+          daemon runs. The privileges are specified using patterns like:
+<codeblock>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=SELECT
+server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=CREATE
+server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=ALL
+</codeblock>
+          For the <varname>server_name</varname> value, substitute the same 
symbolic name you specify with the
+          <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You 
can use <codeph>*</codeph> wildcard
+          characters at each level of the privilege specification to allow 
access to all such objects. For example:
+<codeblock>server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=*-&gt;table=*-&gt;action=CREATE
+server=impala-host.example.com-&gt;db=*-&gt;table=audit_log-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=*
+</codeblock>
+        </p>
+
+        <p>
+          When authorization is enabled, Impala uses the policy file as a 
<i>whitelist</i>, representing every
+          privilege available to any user on any object. That is, only 
operations specified for the appropriate
+          combination of object, role, group, and user are allowed; all other 
operations are not allowed. If a
+          group or role is defined multiple times in the policy file, the last 
definition takes precedence.
+        </p>
+
+        <p>
+          To understand the notion of whitelisting, set up a minimal policy 
file that does not provide any
+          privileges for any object. When you connect to an Impala node where 
this policy file is in effect, you
+          get no results for <codeph>SHOW DATABASES</codeph>, and an error 
when you issue any <codeph>SHOW
+          TABLES</codeph>, <codeph>USE 
<varname>database_name</varname></codeph>, <codeph>DESCRIBE
+          <varname>table_name</varname></codeph>, <codeph>SELECT</codeph>, and 
or other statements that expect to
+          access databases or tables, even if the corresponding databases and 
tables exist.
+        </p>
+
+        <p>
+          The contents of the policy file are cached, to avoid a performance 
penalty for each query. The policy
+          file is re-checked by each <cmdname>impalad</cmdname> node every 5 
minutes. When you make a
+          non-time-sensitive change such as adding new privileges or new 
users, you can let the change take effect
+          automatically a few minutes later. If you remove or reduce 
privileges, and want the change to take effect
+          immediately, restart the <cmdname>impalad</cmdname> daemon on all 
nodes, again specifying the
+          <codeph>-server_name</codeph> and 
<codeph>-authorization_policy_file</codeph> options so that the rules
+          from the updated policy file are applied.
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="security_examples">
+
+      <title>Examples of Policy File Rules for Security Scenarios</title>
+
+      <conbody>
+
+        <p>
+          The following examples show rules that might go in the policy file 
to deal with various
+          authorization-related scenarios. For illustration purposes, this 
section shows several very small policy
+          files with only a few rules each. In your environment, typically you 
would define many roles to cover all
+          the scenarios involving your own databases, tables, and 
applications, and a smaller number of groups,
+          whose members are given the privileges from one or more roles.
+        </p>
+
+        <example id="sec_ex_unprivileged">
+
+          <title>A User with No Privileges</title>
+
+          <p>
+            If a user has no privileges at all, that user cannot access any 
schema objects in the system. The error
+            messages do not disclose the names or existence of objects that 
the user is not authorized to read.
+          </p>
+
+          <p>
+<!--        This example demonstrates the lack of privileges using a blank 
policy file, so no users have any privileges. -->
+            This is the experience you want a user to have if they somehow log 
into a system where they are not an
+            authorized Impala user. In a real deployment with a filled-in 
policy file, a user might have no
+            privileges because they are not a member of any of the relevant 
groups mentioned in the policy file.
+          </p>
+
+<!-- Have the raw material but not formatted into easily digestible example. 
Do for first 1.1 doc refresh.
+<codeblock></codeblock> -->
+
+        </example>
+
+        <example id="sec_ex_superuser">
+
+          <title>Examples of Privileges for Administrative Users</title>
+
+          <p>
+            When an administrative user has broad access to tables or 
databases, the associated rules in the
+            <codeph>[roles]</codeph> section typically use wildcards and/or 
inheritance. For example, in the
+            following sample policy file, <codeph>db=*</codeph> refers to all 
databases and
+            <codeph>db=*-&gt;table=*</codeph> refers to all tables in all 
databases.
+          </p>
+
+          <p>
+            Omitting the rightmost portion of a rule means that the privileges 
apply to all the objects that could
+            be specified there. For example, in the following sample policy 
file, the
+            <codeph>all_databases</codeph> role has all privileges for all 
tables in all databases, while the
+            <codeph>one_database</codeph> role has all privileges for all 
tables in one specific database. The
+            <codeph>all_databases</codeph> role does not grant privileges on 
URIs, so a group with that role could
+            not issue a <codeph>CREATE TABLE</codeph> statement with a 
<codeph>LOCATION</codeph> clause. The
+            <codeph>entire_server</codeph> role has all privileges on both 
databases and URIs within the server.
+          </p>
+
+<codeblock>[groups]
+supergroup = all_databases
+
+[roles]
+read_all_tables = server=server1-&gt;db=*-&gt;table=*-&gt;action=SELECT
+all_tables = server=server1-&gt;db=*-&gt;table=*
+all_databases = server=server1-&gt;db=*
+one_database = server=server1-&gt;db=test_db
+entire_server = server=server1
+</codeblock>
+
+        </example>
+
+        <example id="sec_ex_detailed">
+
+          <title>A User with Privileges for Specific Databases and 
Tables</title>
+
+          <p>
+            If a user has privileges for specific tables in specific 
databases, the user can access those things
+            but nothing else. They can see the tables and their parent 
databases in the output of <codeph>SHOW
+            TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, 
<codeph>USE</codeph> the appropriate databases,
+            and perform the relevant actions (<codeph>SELECT</codeph> and/or 
<codeph>INSERT</codeph>) based on the
+            table privileges. To actually create a table requires the 
<codeph>ALL</codeph> privilege at the
+            database level, so you might define separate roles for the user 
that sets up a schema and other users
+            or applications that perform day-to-day operations on the tables.
+          </p>
+
+          <p>
+            The following sample policy file shows some of the syntax that is 
appropriate as the policy file grows,
+            such as the <codeph>#</codeph> comment syntax, <codeph>\</codeph> 
continuation syntax, and comma
+            separation for roles assigned to groups or privileges assigned to 
roles.
+          </p>
+
+<codeblock>[groups]
+cloudera = training_sysadmin, instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training, \
+server=server1-&gt;db=instructor_private, \
+server=server1-&gt;db=lesson_development
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=instructor_private-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=lesson_development-&gt;table=lesson*
+# This particular course is all about queries, so the students can SELECT but 
not INSERT or CREATE/DROP.
+student = server=server1-&gt;db=training-&gt;table=lesson_*-&gt;action=SELECT
+</codeblock>
+
+        </example>
+
+<!--
+<example id="sec_ex_superuser_single_db">
+<title>A User with Full Privileges for a Specific Database</title>
+<p>
+
+</p>
+<codeblock></codeblock>
+</example>
+
+<example id="sec_ex_readonly_single_db">
+<title>A User with Read-Only Privileges for a Specific Database</title>
+<p>
+
+</p>
+<codeblock></codeblock>
+    <p>
+      If a user has <codeph>SELECT</codeph> privilege for a database, they can 
issue a <codeph>USE</codeph> statement
+      for that database. Whether or not they can access tables within the 
database depends on further privileges
+      defined at the table level.
+    </p>
+
+<codeblock></codeblock>
+
+</example>
+
+<example id="sec_ex_superuser_single_table">
+<title>A User with Full Privileges for a Specific Table</title>
+    <p>
+      If a user has <codeph>SELECT</codeph> privilege for a table, they can 
query, describe, or explain queries for
+      that table.
+    </p>
+
+<codeblock></codeblock>
+</example>
+
+<example id="sec_ex_load_data">
+<title>A User with Privileges to Load Data but not Read Data</title>
+
+    <p>
+      If a user has <codeph>INSERT</codeph> privilege for a table, they can 
write to the table if it already exists.
+      They cannot create or alter the table; those operations require the 
<codeph>ALL</codeph> privilege.
+    </p>
+<codeblock></codeblock>
+</example>
+-->
+
+        <example id="sec_ex_external_files">
+
+          <title>Privileges for Working with External Data Files</title>
+
+          <p>
+            When data is being inserted through the <codeph>LOAD DATA</codeph> 
statement, or is referenced from an
+            HDFS location outside the normal Impala database directories, the 
user also needs appropriate
+            permissions on the URIs corresponding to those HDFS locations.
+          </p>
+
+          <p>
+            In this sample policy file:
+          </p>
+
+          <ul>
+            <li>
+              The <codeph>external_table</codeph> role lets us insert into and 
query the Impala table,
+              <codeph>external_table.sample</codeph>.
+            </li>
+
+            <li>
+              The <codeph>staging_dir</codeph> role lets us specify the HDFS 
path
+              <filepath>/user/cloudera/external_data</filepath> with the 
<codeph>LOAD DATA</codeph> statement.
+              Remember, when Impala queries or loads data files, it operates 
on all the files in that directory,
+              not just a single file, so any Impala <codeph>LOCATION</codeph> 
parameters refer to a directory
+              rather than an individual file.
+            </li>
+
+            <li>
+              We included the IP address and port of the Hadoop name node in 
the HDFS URI of the
+              <codeph>staging_dir</codeph> rule. We found those details in
+              <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the 
<codeph>fs.default.name</codeph>
+              element. That is what we use in any roles that specify URIs 
(that is, the locations of directories in
+              HDFS).
+            </li>
+
+            <li>
+              We start this example after the table 
<codeph>external_table.sample</codeph> is already created. In
+              the policy file for the example, we have already taken away the 
<codeph>external_table_admin</codeph>
+              role from the <codeph>cloudera</codeph> group, and replaced it 
with the lesser-privileged
+              <codeph>external_table</codeph> role.
+            </li>
+
+            <li>
+              We assign privileges to a subdirectory underneath 
<filepath>/user/cloudera</filepath> in HDFS,
+              because such privileges also apply to any subdirectories 
underneath. If we had assigned privileges to
+              the parent directory <filepath>/user/cloudera</filepath>, it 
would be too likely to mess up other
+              files by specifying a wrong location by mistake.
+            </li>
+
+            <li>
+              The <codeph>cloudera</codeph> under the 
<codeph>[groups]</codeph> section refers to the
+              <codeph>cloudera</codeph> group. (In the demo VM used for this 
example, there is a
+              <codeph>cloudera</codeph> user that is a member of a 
<codeph>cloudera</codeph> group.)
+            </li>
+          </ul>
+
+          <p>
+            Policy file:
+          </p>
+
+<codeblock>[groups]
+cloudera = external_table, staging_dir
+
+[roles]
+external_table_admin = server=server1-&gt;db=external_table
+external_table = 
server=server1-&gt;db=external_table-&gt;table=sample-&gt;action=*
+staging_dir = 
server=server1-&gt;uri=hdfs://127.0.0.1:8020/user/cloudera/external_data-&gt;action=*
+</codeblock>
+
+          <p>
+            <cmdname>impala-shell</cmdname> session:
+          </p>
+
+<codeblock>[localhost:21000] &gt; use external_table;
+Query: use external_table
+[localhost:21000] &gt; show tables;
+Query: show tables
+Query finished, fetching results ...
++--------+
+| name   |
++--------+
+| sample |
++--------+
+Returned 1 row(s) in 0.02s
+
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-----+
+| x   |
++-----+
+| 1   |
+| 5   |
+| 150 |
++-----+
+Returned 3 row(s) in 1.04s
+
+[localhost:21000] &gt; load data inpath '/user/cloudera/external_data' into 
table sample;
+Query: load data inpath '/user/cloudera/external_data' into table sample
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-------+
+| x     |
++-------+
+| 2     |
+| 4     |
+| 6     |
+| 8     |
+| 64738 |
+| 49152 |
+| 1     |
+| 5     |
+| 150   |
++-------+
+Returned 9 row(s) in 0.22s
+
+[localhost:21000] &gt; load data inpath '/user/cloudera/unauthorized_data' 
into table sample;
+Query: load data inpath '/user/cloudera/unauthorized_data' into table sample
+ERROR: AuthorizationException: User 'cloudera' does not have privileges to 
access: hdfs://127.0.0.1:8020/user/cloudera/unauthorized_data
+</codeblock>
+
+        </example>
+
+        <example audience="Cloudera" id="sec_ex_views" rev="2.3.0 
collevelauth">
+
+          <title>Controlling Access at the Column Level through Views</title>
+
+          <p>
+            If a user has <codeph>SELECT</codeph> privilege for a view, they 
can query the view, even if they do
+            not have any privileges on the underlying table. To see the 
details about the underlying table through
+            <codeph>EXPLAIN</codeph> or <codeph>DESCRIBE FORMATTED</codeph> 
statements on the view, the user must
+            also have <codeph>SELECT</codeph> privilege for the underlying 
table.
+          </p>
+
+          <note type="important">
+            <p>
+              The types of data that are considered sensitive and confidential 
differ depending on the jurisdiction
+              the type of industry, or both. For fine-grained access controls, 
set up appropriate privileges based
+              on all applicable laws and regulations.
+            </p>
+            <p>
+              Be careful using the <codeph>ALTER VIEW</codeph> statement to 
point an existing view at a different
+              base table or a new set of columns that includes sensitive or 
restricted data. Make sure that any
+              users who have <codeph>SELECT</codeph> privilege on the view do 
not gain access to any additional
+              information they are not authorized to see.
+            </p>
+          </note>
+
+          <p>
+            The following example shows how a system administrator could set 
up a table containing some columns
+            with sensitive information, then create a view that only exposes 
the non-confidential columns.
+          </p>
+
+<codeblock>[localhost:21000] &gt; create table sensitive_info
+                &gt; (
+                &gt;   name string,
+                &gt;   address string,
+                &gt;   credit_card string,
+                &gt;   taxpayer_id string
+                &gt; );
+[localhost:21000] &gt; create view name_address_view as select name, address 
from sensitive_info;
+</codeblock>
+
+          <p>
+            Then the following policy file specifies read-only privilege for 
that view, without authorizing access
+            to the underlying table:
+          </p>
+
+<codeblock>[groups]
+cloudera = view_only_privs
+
+[roles]
+view_only_privs = 
server=server1-&gt;db=reports-&gt;table=name_address_view-&gt;action=SELECT
+</codeblock>
+
+          <p>
+            Thus, a user with the <codeph>view_only_privs</codeph> role could 
access through Impala queries the
+            basic information but not the sensitive information, even if both 
kinds of information were part of the
+            same data file.
+          </p>
+
+          <p>
+            You might define other views to allow users from different groups 
to query different sets of columns.
+          </p>
+
+        </example>
+
+        <example id="sec_sysadmin">
+
+          <title>Separating Administrator Responsibility from Read and Write 
Privileges</title>
+
+          <p>
+            Remember that to create a database requires full privilege on that 
database, while day-to-day
+            operations on tables within that database can be performed with 
lower levels of privilege on specific
+            table. Thus, you might set up separate roles for each database or 
application: an administrative one
+            that could create or drop the database, and a user-level one that 
can access only the relevant tables.
+          </p>
+
+          <p>
+            For example, this policy file divides responsibilities between 
users in 3 different groups:
+          </p>
+
+          <ul>
+            <li>
+              Members of the <codeph>supergroup</codeph> group have the 
<codeph>training_sysadmin</codeph> role and
+              so can set up a database named <codeph>training</codeph>.
+            </li>
+
+            <li>
+              Members of the <codeph>cloudera</codeph> group have the 
<codeph>instructor</codeph> role and so can
+              create, insert into, and query any tables in the 
<codeph>training</codeph> database, but cannot
+              create or drop the database itself.
+            </li>
+
+            <li>
+              Members of the <codeph>visitor</codeph> group have the 
<codeph>student</codeph> role and so can query
+              those tables in the <codeph>training</codeph> database.
+            </li>
+          </ul>
+
+<codeblock>[groups]
+supergroup = training_sysadmin
+cloudera = instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*
+student = server=server1-&gt;db=training-&gt;table=*-&gt;action=SELECT
+</codeblock>
+
+        </example>
+      </conbody>
+    </concept>
+
+    <concept id="security_multiple_policy_files">
+
+      <title>Using Multiple Policy Files for Different Databases</title>
+
+      <conbody>
+
+        <p>
+          For an Impala cluster with many databases being accessed by many 
users and applications, it might be
+          cumbersome to update the security policy file for each privilege 
change or each new database, table, or
+          view. You can allow security to be managed separately for individual 
databases, by setting up a separate
+          policy file for each database:
+        </p>
+
+        <ul>
+          <li>
+            Add the optional <codeph>[databases]</codeph> section to the main 
policy file.
+          </li>
+
+          <li>
+            Add entries in the <codeph>[databases]</codeph> section for each 
database that has its own policy file.
+          </li>
+
+          <li>
+            For each listed database, specify the HDFS path of the appropriate 
policy file.
+          </li>
+        </ul>
+
+        <p>
+          For example:
+        </p>
+
+<codeblock>[databases]
+# Defines the location of the per-DB policy files for the 'customers' and 
'sales' databases.
+customers = hdfs://ha-nn-uri/etc/access/customers.ini
+sales = hdfs://ha-nn-uri/etc/access/sales.ini
+</codeblock>
+
+        <p>
+          To enable URIs in per-DB policy files, add the following string in 
the Cloudera Manager field
+          <uicontrol>Impala Service Environment Advanced Configuration Snippet 
(Safety Valve)</uicontrol>:
+        </p>
+
+<codeblock>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
+</codeblock>
+
+        <note type="important">
+          Enabling URIs in per-DB policy files introduces a security risk by 
allowing the owner of the db-level
+          policy file to grant himself/herself load privileges to anything the 
<codeph>impala</codeph> user has
+          read permissions for in HDFS (including data in other databases 
controlled by different db-level policy
+          files).
+        </note>
+      </conbody>
+    </concept>
+  </concept>
+
+  <concept id="security_schema">
+
+    <title>Setting Up Schema Objects for a Secure Impala Deployment</title>
+
+    <conbody>
+
+      <p>
+        Remember that in your role definitions, you specify privileges at the 
level of individual databases and
+        tables, or all databases or all tables within a database. To simplify 
the structure of these rules, plan
+        ahead of time how to name your schema objects so that data with 
different authorization requirements is
+        divided into separate databases.
+      </p>
+
+      <p>
+        If you are adding security on top of an existing Impala deployment, 
remember that you can rename tables or
+        even move them between databases using the <codeph>ALTER 
TABLE</codeph> statement. In Impala, creating new
+        databases is a relatively inexpensive operation, basically just 
creating a new directory in HDFS.
+      </p>
+
+      <p>
+        You can also plan the security scheme and set up the policy file 
before the actual schema objects named in
+        the policy file exist. Because the authorization capability is based 
on whitelisting, a user can only
+        create a new database or table if the required privilege is already in 
the policy file: either by listing
+        the exact name of the object being created, or a <codeph>*</codeph> 
wildcard to match all the applicable
+        objects within the appropriate container.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="security_privileges">
+
+    <title>Privilege Model and Object Hierarchy</title>
+
+    <conbody>
+
+      <p>
+        Privileges can be granted on different objects in the schema. Any 
privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is 
granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the 
same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p>
+        The kinds of objects in the schema hierarchy are:
+      </p>
+
+<codeblock>Server
+URI
+Database
+  Table
+</codeblock>
+
+      <p>
+        The server name is specified by the <codeph>-server_name</codeph> 
option when <cmdname>impalad</cmdname>
+        starts. Specify the same name for all <cmdname>impalad</cmdname> nodes 
in the cluster.
+      </p>
+
+      <p>
+        URIs represent the HDFS paths you specify as part of statements such 
as <codeph>CREATE EXTERNAL
+        TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify 
what look like UNIX paths, but these
+        locations can also be prefixed with <codeph>hdfs://</codeph> to make 
clear that they are really URIs. To
+        set privileges for a URI, specify the name of a directory, and the 
privilege applies to all the files in
+        that directory and any directories underneath it.
+      </p>
+
+      <p rev="2.3.0 collevelauth">
+        In CDH 5.5 / Impala 2.3 and higher, you can specify privileges for 
individual columns,
+        as described in <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/>. Formerly, to 
specify
+        read privileges at this level, you created a view that queried 
specific columns and/or partitions from a base
+        table, and gave <codeph>SELECT</codeph> privilege on the view but not 
the underlying table.
+      </p>
+
+      <p>
+        URIs must start with either <codeph>hdfs://</codeph> or 
<codeph>file://</codeph>. If a URI starts with
+        anything else, it will cause an exception and the policy file will be 
invalid. When defining URIs for HDFS,
+        you must also specify the NameNode. For example:
+<codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir, \
+server=server1-&gt;uri=hdfs://namenode:port/path/to/dir
+</codeblock>
+        <note type="warning">
+          <p>
+            Because the NameNode host and port must be specified, Cloudera 
strongly recommends you use High
+            Availability (HA). This ensures that the URI will remain constant 
even if the namenode changes.
+          </p>
+<codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir,\ 
server=server1-&gt;uri=hdfs://ha-nn-uri/path/to/dir
+</codeblock>
+        </note>
+      </p>
+
+<!-- Experiment with replacing my original copied table with a conref'ed 
version of Ambreen's from the Security Guide. -->
+
+<!--
+          <table>
+          <title>Sentry privilege types and objects they apply to</title>
+          <tgroup cols="2">
+              <colspec colnum="1" colname="col1"/>
+              <colspec colnum="2" colname="col2"/>
+              <tbody>
+                  <row>
+                      <entry>Privilege</entry>
+                      <entry>Object</entry>
+                  </row>
+                  <row>
+                      <entry>INSERT</entry>
+                      <entry>TABLE, URI</entry>
+                  </row>
+                  <row>
+                      <entry>SELECT</entry>
+                      <entry>TABLE, VIEW, URI</entry>
+                  </row>
+                  <row>
+                      <entry>ALL</entry>
+                      <entry>SERVER, DB, URI</entry>
+                  </row>
+              </tbody>
+          </tgroup>
+          </table>
+-->
+
+      <table conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/table_stm_53p_wk">
+        <tgroup cols="2">
+          <colspec colnum="1" colname="col1" colwidth="1*"/>
+<!-- <colspec colnum="2" colname="col2" colwidth="2*"/> -->
+          <tbody>
+            <row>
+              <entry/>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+      <note>
+        <p>
+          Although this document refers to the <codeph>ALL</codeph> privilege, 
currently if you use the policy file
+          mode, you do not use the actual keyword <codeph>ALL</codeph> in the 
policy file. When you code role
+          entries in the policy file:
+        </p>
+        <ul>
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a server, use a 
role like
+            <codeph>server=<varname>server_name</varname></codeph>.
+          </li>
+
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a database, use 
a role like
+            
<codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname></codeph>.
+          </li>
+
+          <li>
+            To specify the <codeph>ALL</codeph> privilege for a table, use a 
role like
+            
<codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=*</codeph>.
+          </li>
+        </ul>
+      </note>
+
+      <draft-comment author="ambreen.kazi" translate="no">The Privilege Model 
lives at:
+https://wiki.cloudera.com/pages/viewpage.action?pageId=24919851</draft-comment>
+
+      <table>
+        <tgroup cols="4">
+          <colspec colnum="1" colname="col1" colwidth="1.31*"/>
+          <colspec colnum="2" colname="col2" colwidth="1.17*"/>
+          <colspec colnum="3" colname="col3" colwidth="1*"/>
+          <colspec colname="newCol4" colnum="4" colwidth="1*"/>
+          <thead>
+            <row>
+              <entry>
+                Operation
+              </entry>
+              <entry>
+                Scope
+              </entry>
+              <entry>
+                Privileges
+              </entry>
+              <entry>
+                URI
+              </entry>
+            </row>
+          </thead>
+          <tbody>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/explain_privs">
+              <entry/>
+            </row>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/load_data_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/create_database_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/drop_database_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/create_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/drop_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/describe_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_add_columns_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_replace_columns_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_change_column_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_rename_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_set_tblproperties_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_set_fileformat_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_set_location_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_add_partition_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_add_partition_location_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_drop_partition_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_partition_set_fileformat_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_set_serdeproperties_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/create_view_privs">
+              <entry/>
+            </row>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/drop_view_privs">
+              <entry/>
+            </row>
+            <row id="alter_view_privs">
+              <entry>
+                ALTER VIEW
+              </entry>
+              <entry rev="2.3.0 collevelauth">
+                You need <codeph>ALL</codeph> privilege on the named view <ph 
rev="1.4.0">and the parent
+                database</ph>, plus <codeph>SELECT</codeph> privilege for any 
tables or views referenced by the
+                view query. Once the view is created or altered by a 
high-privileged system administrator, it can
+                be queried by a lower-privileged user who does not have full 
query privileges for the base tables.
+              </entry>
+              <entry>
+                ALL, SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/alter_table_set_location_privs">
+              <entry/>
+            </row>
+            <row id="create_external_table_privs">
+              <entry>
+                CREATE EXTERNAL TABLE
+              </entry>
+              <entry>
+                Database (ALL), URI (SELECT)
+              </entry>
+              <entry>
+                ALL, SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/select_privs">
+              <entry/>
+            </row>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/use_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/create_function_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/drop_function_privs">
+              <entry/>
+            </row>
+            <row conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/refresh_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/invalidate_metadata_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/invalidate_metadata_table_privs">
+              <entry/>
+            </row>
+            <row 
conref="cdh_sg_sentry.xml#concept_yk2_j3p_wk/compute_stats_privs">
+              <entry/>
+            </row>
+            <row id="show_table_stats_privs">
+              <entry>
+                SHOW TABLE STATS, SHOW PARTITIONS
+              </entry>
+              <entry>
+                TABLE
+              </entry>
+              <entry>
+                SELECT/INSERT
+              </entry>
+              <entry/>
+            </row>
+            <row>
+              <entry id="show_column_stats_privs">
+                SHOW COLUMN STATS
+              </entry>
+              <entry>
+                TABLE
+              </entry>
+              <entry>
+                SELECT/INSERT
+              </entry>
+              <entry/>
+            </row>
+            <row>
+              <entry id="show_functions_privs">
+                SHOW FUNCTIONS
+              </entry>
+              <entry>
+                DATABASE
+              </entry>
+              <entry>
+                SELECT
+              </entry>
+              <entry/>
+            </row>
+            <row id="show_tables_privs">
+              <entry>
+                SHOW TABLES
+              </entry>
+              <entry/>
+              <entry>
+                No special privileges needed to issue the statement, but only 
shows objects you are authorized for
+              </entry>
+              <entry/>
+            </row>
+            <row id="show_databases_privs">
+              <entry>
+                SHOW DATABASES, SHOW SCHEMAS
+              </entry>
+              <entry/>
+              <entry>
+                No special privileges needed to issue the statement, but only 
shows objects you are authorized for
+              </entry>
+              <entry/>
+            </row>
+          </tbody>
+        </tgroup>
+      </table>
+
+<!-- Original table, now replaced by content conref'ed from Security Guide
+<table>
+<title>Sentry privileges for Impala SQL operations</title>
+          <tgroup cols="3">
+              <colspec colnum="1" colname="col1" colwidth="*"/>
+              <colspec colnum="2" colname="col2" colwidth="*"/>
+              <colspec colnum="3" colname="col3" colwidth="4*"/>
+<thead>
+<row>
+<entry>Impala SQL Operation</entry>
+<entry>Privileges Required</entry>
+<entry>Object on which Privileges Required</entry>
+</row>
+</thead>
+<tbody>
+<row>
+<entry>EXPLAIN</entry>
+<entry>SELECT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>LOAD DATA</entry>
+<entry>INSERT, SELECT</entry>
+<entry>Table (INSERT) and URI (ALL); write privilege is required on the URI 
because the data files are physically moved from there</entry>
+</row>
+<row>
+<entry>CREATE DATABASE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Server</ph></entry>
+</row>
+<row>
+<entry>DROP DATABASE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Server</ph></entry>
+</row>
+<row>
+<entry>CREATE TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>DROP TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>DESCRIBE TABLE</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>ALTER TABLE</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>SHOW DATABASES</entry>
+<entry>Any privilege</entry>
+<entry>Any object in the database; only databases for which the user has 
privileges on some object are shown</entry>
+</row>
+<row>
+<entry>SHOW TABLES</entry>
+<entry>Any privilege</entry>
+<entry>Any table in the database; only tables for which the user has 
privileges are shown</entry>
+</row>
+<row>
+<entry>CREATE VIEW</entry>
+<entry>ALL, SELECT</entry>
+<entry rev="2.3.0 collevelauth">
+You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and 
the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables 
or views referenced by the view query.
+Once the view is created or altered by a high-privileged system administrator,
+it can be queried by a lower-privileged user who does not have full query 
privileges for the base tables.
+</entry>
+</row>
+<row>
+<entry>DROP VIEW</entry>
+<entry>ALL</entry>
+<entry><ph rev="1.4.0">Database</ph></entry>
+</row>
+<row>
+<entry>ALTER VIEW</entry>
+<entry>ALL, SELECT</entry>
+<entry rev="2.3.0 collevelauth">
+You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and 
the parent database</ph>, plus <codeph>SELECT</codeph> privilege for any tables 
or views referenced by the view query.
+Once the view is created or altered by a high-privileged system administrator,
+it can be queried by a lower-privileged user who does not have full query 
privileges for the base tables.
+</entry>
+</row>
+<row>
+<entry>ALTER TABLE LOCATION</entry>
+<entry>ALL</entry>
+<entry>Table, URI</entry>
+</row>
+<row>
+<entry>CREATE EXTERNAL TABLE</entry>
+<entry>ALL, SELECT</entry>
+<entry>Database (ALL), URI (SELECT)</entry>
+</row>
+<row>
+<entry>SELECT</entry>
+<entry>SELECT</entry>
+<entry rev="2.3.0 collevelauth">Table, View; you can have SELECT privilege for 
a view without having SELECT privilege
+for the underlying tables. See <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/> for details
+about column-level authorization.
+Before column-level authorization was added in CDH 5.5 / Impala 2.3, the way 
to implement
+column-level security was by creating views that referenced particular sets of 
columns.</entry>
+</row>
+<row>
+<entry>USE</entry>
+<entry>Any privilege</entry>
+<entry>Any object in the database</entry>
+</row>
+<row rev="1.2">
+<entry>CREATE FUNCTION</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row rev="1.2">
+<entry>DROP FUNCTION</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row>
+<entry>REFRESH</entry>
+<entry>ALL</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>INVALIDATE METADATA</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row rev="1.2.2">
+<entry>COMPUTE STATS</entry>
+<entry>ALL</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE STATS, SHOW PARTITIONS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW COLUMN STATS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW FUNCTIONS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+<row>
+<entry>SHOW DATABASES, SHOW SCHEMAS</entry>
+<entry>?</entry>
+<entry>?</entry>
+</row>
+</tbody>
+</tgroup>
+</table>
+-->
+
+<!-- Here are pieces from the original Hive table that do not apply to Impala 
because of differences in the SQL syntax.
+   Might resurrect some of these later as features are added to Impala, in 
particular CREATE TABLE AS SELECT.
+<row>
+<entry>EXPORT</entry>
+<entry>SELECT, INSERT@URI</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>IMPORT</entry>
+<entry>ALL, SELECT@URI</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ANALYZE</entry>
+<entry>INSERT, SELECT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SWITCH DATABASE</entry>
+<entry>Any</entry>
+<entry>Any Table, View in the DB</entry>
+</row>
+<row>
+<entry>DESCRIBE FUNCTION&#42;</entry>
+<entry>Not Restricted</entry>
+</row>
+<row>
+<entry>MSCK</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>SHOW COLUMNS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE STATUS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW TABLE PROPERTIES</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW CREATE TABLE</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW FUNCTIONS&#42;</entry>
+<entry>Not restricted</entry>
+</row>
+<row>
+<entry>SHOW PARTITIONS</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW INDEXES</entry>
+<entry>SELECT or INSERT</entry>
+<entry>Table</entry>
+</row>
+<row>
+<entry>SHOW LOCKS</entry>
+<entry>Not Restricted</entry>
+</row>
+<row>
+<entry>CREATE FUNCTION</entry>
+<entry>Any Privilege</entry>
+<entry>Any Object</entry>
+</row>
+<row>
+<entry>CREATE INDEX</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>DROP FUNCTION</entry>
+<entry>Any Privilege</entry>
+<entry>Any Object</entry>
+</row>
+<row>
+<entry>DROP INDEX</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER INDEX REBUILD</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER VIEW PROPERTIES</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>GRANT PRIVILEGE</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>REVOKE PRIVILEGE</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>SHOW GRANTS</entry>
+<entry>Allowed, but has no effect on HS2 auth</entry>
+</row>
+<row>
+<entry>ALTER TABLE PROTECT MODE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE FILE FORMAT</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION PROTECT MODE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION FILE FORMAT</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION LOCATION&#42;</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+<row>
+<entry>CREATE TABLE AS SELECT</entry>
+<entry>ALL, SELECT</entry>
+<entry>Database, Table/View</entry>
+</row>
+<row>
+<entry>ALTER INDEX PROP</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER DATABASE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>DESCRIBE DATABASE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE MERGE FILE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER PARTITION MERGE FILE</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE SKEWED</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ALTER TABLE PARTITION SKEWED LOCATION</entry>
+<entry>ALL</entry>
+<entry>Database</entry>
+</row>
+<row>
+<entry>ADD JAR&#42;</entry>
+<entry>Restricted unless hive.server2.authorization.external.exec = 
true</entry>
+</row>
+<row>
+<entry>TRANSFORM&#42;</entry>
+<entry>ALL</entry>
+<entry>Server</entry>
+</row>
+-->
+    </conbody>
+  </concept>
+
+  <concept id="sentry_debug">
+
+    <title><ph 
conref="../shared/impala_common.xml#common/title_sentry_debug"/></title>
+
+    <conbody>
+
+      <p conref="../shared/impala_common.xml#common/sentry_debug"/>
+    </conbody>
+  </concept>
+
+  <concept id="sentry_cm">
+
+    <title>Managing Sentry for Impala through Cloudera Manager</title>
+
+    <conbody>
+
+      <p>
+        To enable the Sentry service for Impala and Hive, set 
<uicontrol>Hive/Impala &gt; Service-Wide &gt; Sentry
+        Service</uicontrol> parameter to the Sentry service. Then restart 
Impala and Hive. Simply adding Sentry
+        service as a dependency and restarting enables Impala and Hive to use 
the Sentry service.
+      </p>
+
+      <p>
+        To set the server name to use when granting server level privileges, 
set the <uicontrol>Hive &gt;
+        Service-Wide &gt; Advanced &gt; Server Name for Sentry 
Authorization</uicontrol> parameter. When using
+        Sentry with the Hive Metastore, you can specify the list of users that 
are allowed to bypass Sentry
+        Authorization in Hive Metastore using <uicontrol>Hive &gt; 
Service-Wide &gt; Security &gt; Bypass Sentry
+        Authorization Users</uicontrol>. These are usually service users that 
already ensure all activity has been
+        authorized.
+      </p>
+
+      <note>
+        The <uicontrol>Hive/Impala &gt; Service-Wide &gt; Policy File Based 
Sentry</uicontrol> tab contains
+        parameters only relevant to configuring Sentry using policy files. In 
particular, make sure that
+        <uicontrol>Enable Sentry Authorization using Policy Files</uicontrol> 
parameter is unchecked when using the
+        Sentry service. Cloudera Manager throws a validation error if you 
attempt to configure the Sentry service
+        and policy file at the same time.
+      </note>
     </conbody>
   </concept>
 
+  <concept id="sec_ex_default">
+
+    <title>The DEFAULT Database in a Secure Deployment</title>
+
+    <conbody>
+
+      <p>
+        Because of the extra emphasis on granular access controls in a secure 
deployment, you should move any
+        important or sensitive information out of the <codeph>DEFAULT</codeph> 
database into a named database whose
+        privileges are specified in the policy file. Sometimes you might need 
to give privileges on the
+        <codeph>DEFAULT</codeph> database for administrative reasons; for 
example, as a place you can reliably
+        specify with a <codeph>USE</codeph> statement when preparing to drop a 
database.
+      </p>
+
+<!-- Maybe have an example later, but not for initial 1.1 release.
+<codeblock></codeblock>
+-->
+    </conbody>
+  </concept>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_avg.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_avg.xml b/docs/topics/impala_avg.xml
index 26f5450..d95b2d5 100644
--- a/docs/topics/impala_avg.xml
+++ b/docs/topics/impala_avg.xml
@@ -3,7 +3,7 @@
 <concept id="avg">
 
   <title>AVG Function</title>
-  <titlealts><navtitle>AVG</navtitle></titlealts>
+  <titlealts audience="PDF"><navtitle>AVG</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
@@ -12,6 +12,8 @@
       <data name="Category" value="Analytic Functions"/>
       <data name="Category" value="Aggregate Functions"/>
       <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>
 
@@ -44,7 +46,7 @@
     <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
 
     <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
-    
+
     <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
 
     <p conref="../shared/impala_common.xml#common/example_blurb"/>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_avro.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_avro.xml b/docs/topics/impala_avro.xml
index ae19192..d89d782 100644
--- a/docs/topics/impala_avro.xml
+++ b/docs/topics/impala_avro.xml
@@ -4,7 +4,15 @@
 
   <title>Using the Avro File Format with Impala Tables</title>
   <titlealts audience="PDF"><navtitle>Avro Data Files</navtitle></titlealts>
-  
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="File Formats"/>
+      <data name="Category" value="Avro"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
 
   <conbody>
 
@@ -15,9 +23,534 @@
       insert operations, use Hive, then switch back to Impala to run queries.
     </p>
 
-    
+    <table>
+      <title>Avro Format Support in Impala</title>
+      <tgroup cols="5">
+        <colspec colname="1" colwidth="10*"/>
+        <colspec colname="2" colwidth="10*"/>
+        <colspec colname="3" colwidth="20*"/>
+        <colspec colname="4" colwidth="30*"/>
+        <colspec colname="5" colwidth="30*"/>
+        <thead>
+          <row>
+            <entry>
+              File Type
+            </entry>
+            <entry>
+              Format
+            </entry>
+            <entry>
+              Compression Codecs
+            </entry>
+            <entry>
+              Impala Can CREATE?
+            </entry>
+            <entry>
+              Impala Can INSERT?
+            </entry>
+          </row>
+        </thead>
+        <tbody>
+          <row conref="impala_file_formats.xml#file_formats/avro_support">
+            <entry/>
+          </row>
+        </tbody>
+      </tgroup>
+    </table>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="avro_create_table">
+
+    <title>Creating Avro Tables</title>
+
+    <conbody>
+
+      <p>
+        To create a new table using the Avro file format, issue the 
<codeph>CREATE TABLE</codeph> statement through
+        Impala with the <codeph>STORED AS AVRO</codeph> clause, or through 
Hive. If you create the table through
+        Impala, you must include column definitions that match the fields 
specified in the Avro schema. With Hive,
+        you can omit the columns and just specify the Avro schema.
+      </p>
+
+      <p rev="2.3.0">
+        In CDH 5.5 / Impala 2.3 and higher, the <codeph>CREATE TABLE</codeph> 
for Avro tables can include
+        SQL-style column definitions rather than specifying Avro notation 
through the <codeph>TBLPROPERTIES</codeph>
+        clause. Impala issues warning messages if there are any mismatches 
between the types specified in the
+        SQL column definitions and the underlying types; for example, any 
<codeph>TINYINT</codeph> or
+        <codeph>SMALLINT</codeph> columns are treated as <codeph>INT</codeph> 
in the underlying Avro files,
+        and therefore are displayed as <codeph>INT</codeph> in any 
<codeph>DESCRIBE</codeph> or
+        <codeph>SHOW CREATE TABLE</codeph> output.
+      </p>
+
+      <note>
+        <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
+      </note>
+
+      <!--
+        To do: Expand these examples to show switching between impala-shell 
and Hive, loading some data, and then
+        doing DESCRIBE and querying the table.
+      -->
+
+      <p>
+        The following examples demonstrate creating an Avro table in Impala, 
using either an inline column
+        specification or one taken from a JSON file stored in HDFS:
+      </p>
+
+<codeblock><![CDATA[
+[localhost:21000] > CREATE TABLE avro_only_sql_columns
+                  > (
+                  >   id INT,
+                  >   bool_col BOOLEAN,
+                  >   tinyint_col TINYINT, /* Gets promoted to INT */
+                  >   smallint_col SMALLINT, /* Gets promoted to INT */
+                  >   int_col INT,
+                  >   bigint_col BIGINT,
+                  >   float_col FLOAT,
+                  >   double_col DOUBLE,
+                  >   date_string_col STRING,
+                  >   string_col STRING
+                  > )
+                  > STORED AS AVRO;
+
+[localhost:21000] > CREATE TABLE impala_avro_table
+                  > (bool_col BOOLEAN, int_col INT, long_col BIGINT, float_col 
FLOAT, double_col DOUBLE, string_col STRING, nullable_int INT)
+                  > STORED AS AVRO
+                  > TBLPROPERTIES ('avro.schema.literal'='{
+                  >    "name": "my_record",
+                  >    "type": "record",
+                  >    "fields": [
+                  >       {"name":"bool_col", "type":"boolean"},
+                  >       {"name":"int_col", "type":"int"},
+                  >       {"name":"long_col", "type":"long"},
+                  >       {"name":"float_col", "type":"float"},
+                  >       {"name":"double_col", "type":"double"},
+                  >       {"name":"string_col", "type":"string"},
+                  >       {"name": "nullable_int", "type": ["null", 
"int"]}]}');
+
+[localhost:21000] > CREATE TABLE avro_examples_of_all_types (
+                  >     id INT,
+                  >     bool_col BOOLEAN,
+                  >     tinyint_col TINYINT,
+                  >     smallint_col SMALLINT,
+                  >     int_col INT,
+                  >     bigint_col BIGINT,
+                  >     float_col FLOAT,
+                  >     double_col DOUBLE,
+                  >     date_string_col STRING,
+                  >     string_col STRING
+                  >   )
+                  >   STORED AS AVRO
+                  >   TBLPROPERTIES 
('avro.schema.url'='hdfs://localhost:8020/avro_schemas/alltypes.json');
+]]>
+</codeblock>
+
+      <p>
+        The following example demonstrates creating an Avro table in Hive:
+      </p>
+
+<codeblock><![CDATA[
+hive> CREATE TABLE hive_avro_table
+    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+    > STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+    > OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+    > TBLPROPERTIES ('avro.schema.literal'='{
+    >    "name": "my_record",
+    >    "type": "record",
+    >    "fields": [
+    >       {"name":"bool_col", "type":"boolean"},
+    >       {"name":"int_col", "type":"int"},
+    >       {"name":"long_col", "type":"long"},
+    >       {"name":"float_col", "type":"float"},
+    >       {"name":"double_col", "type":"double"},
+    >       {"name":"string_col", "type":"string"},
+    >       {"name": "nullable_int", "type": ["null", "int"]}]}');
+]]>
+</codeblock>
+
+      <p>
+        Each field of the record becomes a column of the table. Note that any 
other information, such as the record
+        name, is ignored.
+      </p>
+
+<!-- Have not got a working example of this syntax yet from Lenni.
+<p>
+The schema can be specified either through the <codeph>TBLPROPERTIES</codeph> 
clause or the
+<codeph>WITH SERDEPROPERTIES</codeph> clause.
+For best compatibility with future versions of Hive, use the <codeph>WITH 
SERDEPROPERTIES</codeph> clause
+for this information.
+</p>
+-->
+
+      <note>
+        For nullable Avro columns, make sure to put the 
<codeph>"null"</codeph> entry before the actual type name.
+        In Impala, all columns are nullable; Impala currently does not have a 
<codeph>NOT NULL</codeph> clause. Any
+        non-nullable property is only enforced on the Avro side.
+      </note>
+
+      <p>
+        Most column types map directly from Avro to Impala under the same 
names. These are the exceptions and
+        special cases to consider:
+      </p>
+
+      <ul>
+        <li>
+          The <codeph>DECIMAL</codeph> type is defined in Avro as a 
<codeph>BYTE</codeph> type with the
+          <codeph>logicalType</codeph> property set to 
<codeph>"decimal"</codeph> and a specified precision and
+          scale. Use <codeph>DECIMAL</codeph> in Avro tables only under CDH 5. 
The infrastructure and components
+          under CDH 4 do not have reliable <codeph>DECIMAL</codeph> support.
+        </li>
+
+        <li>
+          The Avro <codeph>long</codeph> type maps to <codeph>BIGINT</codeph> 
in Impala.
+        </li>
+      </ul>
+
+      <p>
+        If you create the table through Hive, switch back to 
<cmdname>impala-shell</cmdname> and issue an
+        <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph> 
statement. Then you can run queries for
+        that table through <cmdname>impala-shell</cmdname>.
+      </p>
+
+      <p rev="2.3.0">
+        In rare instances, a mismatch could occur between the Avro schema and 
the column definitions in the
+        metastore database. In CDH 5.5 / Impala 2.3 and higher, Impala checks 
for such inconsistencies during
+        a <codeph>CREATE TABLE</codeph> statement and each time it loads the 
metadata for a table (for example,
+        after <codeph>INVALIDATE METADATA</codeph>). Impala uses the following 
rules to determine how to treat
+        mismatching columns, a process known as <term>schema 
reconciliation</term>:
+        <ul>
+        <li>
+          If there is a mismatch in the number of columns, Impala uses the 
column
+          definitions from the Avro schema.
+        </li>
+        <li>
+          If there is a mismatch in column name or type, Impala uses the 
column definition from the Avro schema.
+          Because a <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> column 
in Impala maps to an Avro <codeph>STRING</codeph>,
+          this case is not considered a mismatch and the column is preserved 
as <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph>
+          in the reconciled schema. <ph rev="2.7.0 IMPALA-3687 
CDH-43731">Prior to CDH 5.9 / Impala 2.7, the column
+          name and comment for such <codeph>CHAR</codeph> and 
<codeph>VARCHAR</codeph> columns was also taken from the SQL column definition.
+          In CDH 5.9 / Impala 2.7 and higher, the column name and comment from 
the Avro schema file take precedence for such columns,
+          and only the <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph> type 
is preserved from the SQL column definition.</ph>
+        </li>
+        <li>
+          An Impala <codeph>TIMESTAMP</codeph> column definition maps to an 
Avro <codeph>STRING</codeph> and is presented as a <codeph>STRING</codeph>
+          in the reconciled schema, because Avro has no binary 
<codeph>TIMESTAMP</codeph> representation.
+          As a result, no Avro table can have a <codeph>TIMESTAMP</codeph> 
column; this restriction is the same as
+          in earlier CDH and Impala releases.
+        </li>
+        </ul>
+      </p>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_unsupported_filetype"/>
+
+    </conbody>
+  </concept>
+
+  <concept id="avro_map_table">
+
+    <title>Using a Hive-Created Avro Table in Impala</title>
+
+    <conbody>
+
+      <p>
+        If you have an Avro table created through Hive, you can use it in 
Impala as long as it contains only
+        Impala-compatible data types. It cannot contain:
+        <ul>
+          <li>
+            Complex types: <codeph>array</codeph>, <codeph>map</codeph>, 
<codeph>record</codeph>,
+            <codeph>struct</codeph>, <codeph>union</codeph> other than
+            <codeph>[<varname>supported_type</varname>,null]</codeph> or
+            <codeph>[null,<varname>supported_type</varname>]</codeph>
+          </li>
+
+          <li>
+            The Avro-specific types <codeph>enum</codeph>, 
<codeph>bytes</codeph>, and <codeph>fixed</codeph>
+          </li>
+
+          <li>
+            Any scalar type other than those listed in <xref 
href="impala_datatypes.xml#datatypes"/>
+          </li>
+        </ul>
+        Because Impala and Hive share the same metastore database, Impala can 
directly access the table definitions
+        and data for tables that were created in Hive.
+      </p>
+
+      <p>
+        If you create an Avro table in Hive, issue an <codeph>INVALIDATE 
METADATA</codeph> the next time you
+        connect to Impala through <cmdname>impala-shell</cmdname>. This is a 
one-time operation to make Impala
+        aware of the new table. You can issue the statement while connected to 
any Impala node, and the catalog
+        service broadcasts the change to all other Impala nodes.
+      </p>
+
+      <p>
+        If you load new data into an Avro table through Hive, either through a 
Hive <codeph>LOAD DATA</codeph> or
+        <codeph>INSERT</codeph> statement, or by manually copying or moving 
files into the data directory for the
+        table, issue a <codeph>REFRESH <varname>table_name</varname></codeph> 
statement the next time you connect
+        to Impala through <cmdname>impala-shell</cmdname>. You can issue the 
statement while connected to any
+        Impala node, and the catalog service broadcasts the change to all 
other Impala nodes. If you issue the
+        <codeph>LOAD DATA</codeph> statement through Impala, you do not need a 
<codeph>REFRESH</codeph> afterward.
+      </p>
+
+      <p>
+        Impala only supports fields of type <codeph>boolean</codeph>, 
<codeph>int</codeph>, <codeph>long</codeph>,
+        <codeph>float</codeph>, <codeph>double</codeph>, and 
<codeph>string</codeph>, or unions of these types with
+        null; for example, <codeph>["string", "null"]</codeph>. Unions with 
<codeph>null</codeph> essentially
+        create a nullable type.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="avro_json">
+
+    <title>Specifying the Avro Schema through JSON</title>
+
+    <conbody>
+
+      <p>
+        While you can embed a schema directly in your <codeph>CREATE 
TABLE</codeph> statement, as shown above,
+        column width restrictions in the Hive metastore limit the length of 
schema you can specify. If you
+        encounter problems with long schema literals, try storing your schema 
as a <codeph>JSON</codeph> file in
+        HDFS instead. Specify your schema in HDFS using table properties 
similar to the following:
+      </p>
+
+<codeblock>tblproperties 
('avro.schema.url'='hdfs//your-name-node:port/path/to/schema.json');</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="avro_load_data">
+
+    <title>Loading Data into an Avro Table</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p rev="DOCS-1523">
+        Currently, Impala cannot write Avro data files. Therefore, an Avro 
table cannot be used as the destination
+        of an Impala <codeph>INSERT</codeph> statement or <codeph>CREATE TABLE 
AS SELECT</codeph>.
+      </p>
+
+      <p>
+        To copy data from another table, issue any <codeph>INSERT</codeph> 
statements through Hive. For information
+        about loading data into Avro tables through Hive, see
+        <xref 
href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe"; 
scope="external" format="html">Avro
+        page on the Hive wiki</xref>.
+      </p>
+
+      <p>
+        If you already have data files in Avro format, you can also issue 
<codeph>LOAD DATA</codeph> in either
+        Impala or Hive. Impala can move existing Avro data files into an Avro 
table, it just cannot create new
+        Avro data files.
+      </p>
+
+    </conbody>
+  </concept>
+
+  <concept id="avro_compression">
+
+    <title>Enabling Compression for Avro Tables</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Compression"/>
+      <data name="Category" value="Snappy"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">compression</indexterm>
+        To enable compression for Avro tables, specify settings in the Hive 
shell to enable compression and to
+        specify a codec, then issue a <codeph>CREATE TABLE</codeph> statement 
as in the preceding examples. Impala
+        supports the <codeph>snappy</codeph> and <codeph>deflate</codeph> 
codecs for Avro tables.
+      </p>
+
+      <p>
+        For example:
+      </p>
+
+<codeblock>hive&gt; set hive.exec.compress.output=true;
+hive&gt; set avro.output.codec=snappy;</codeblock>
+    </conbody>
+  </concept>
+
+  <concept rev="1.1" id="avro_schema_evolution">
+
+    <title>How Impala Handles Avro Schema Evolution</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        Starting in Impala 1.1, Impala can deal with Avro data files that 
employ <term>schema evolution</term>,
+        where different data files within the same table use slightly 
different type definitions. (You would
+        perform the schema evolution operation by issuing an <codeph>ALTER 
TABLE</codeph> statement in the Hive
+        shell.) The old and new types for any changed columns must be 
compatible, for example a column might start
+        as an <codeph>int</codeph> and later change to a 
<codeph>bigint</codeph> or <codeph>float</codeph>.
+      </p>
+
+      <p>
+        As with any other tables where the definitions are changed or data is 
added outside of the current
+        <cmdname>impalad</cmdname> node, ensure that Impala loads the latest 
metadata for the table if the Avro
+        schema is modified through Hive. Issue a <codeph>REFRESH 
<varname>table_name</varname></codeph> or
+        <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph> 
statement. <codeph>REFRESH</codeph>
+        reloads the metadata immediately, <codeph>INVALIDATE METADATA</codeph> 
reloads the metadata the next time
+        the table is accessed.
+      </p>
+
+      <p>
+        When Avro data files or columns are not consulted during a query, 
Impala does not check for consistency.
+        Thus, if you issue <codeph>SELECT c1, c2 FROM t1</codeph>, Impala does 
not return any error if the column
+        <codeph>c3</codeph> changed in an incompatible way. If a query 
retrieves data from some partitions but not
+        others, Impala does not check the data files for the unused partitions.
+      </p>
+
+      <p>
+        In the Hive DDL statements, you can specify an 
<codeph>avro.schema.literal</codeph> table property (if the
+        schema definition is short) or an <codeph>avro.schema.url</codeph> 
property (if the schema definition is
+        long, or to allow convenient editing for the definition).
+      </p>
+
+      <p>
+        For example, running the following SQL code in the Hive shell creates 
a table using the Avro file format
+        and puts some sample data into it:
+      </p>
+
+<codeblock>CREATE TABLE avro_table (a string, b string)
+ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"}
+    ]}');
+
+INSERT OVERWRITE TABLE avro_table SELECT 1, "avro" FROM functional.alltypes 
LIMIT 1;
+</codeblock>
+
+      <p>
+        Once the Avro table is created and contains data, you can query it 
through the
+        <cmdname>impala-shell</cmdname> command:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select * from avro_table;
++---+------+
+| a | b    |
++---+------+
+| 1 | avro |
++---+------+
+</codeblock>
+
+      <p>
+        Now in the Hive shell, you change the type of a column and add a new 
column with a default value:
+      </p>
+
+<codeblock>-- Promote column "a" from INT to FLOAT (no need to update Avro 
schema)
+ALTER TABLE avro_table CHANGE A A FLOAT;
+
+-- Add column "c" with default
+ALTER TABLE avro_table ADD COLUMNS (c int);
+ALTER TABLE avro_table SET TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"},
+      {"name": "c", "type": "int", "default": 10}
+    ]}');
+</codeblock>
+
+      <p>
+        Once again in <cmdname>impala-shell</cmdname>, you can query the Avro 
table based on its latest schema
+        definition. Because the table metadata was changed outside of Impala, 
you issue a <codeph>REFRESH</codeph>
+        statement first so that Impala has up-to-date metadata for the table.
+      </p>
+
+<codeblock>[localhost:21000] &gt; refresh avro_table;
+[localhost:21000] &gt; select * from avro_table;
++---+------+----+
+| a | b    | c  |
++---+------+----+
+| 1 | avro | 10 |
++---+------+----+
+</codeblock>
+    </conbody>
+  </concept>
+
+  <concept id="avro_data_types">
+
+    <title>Data Type Considerations for Avro Tables</title>
+
+    <conbody>
+
+      <p>
+        The Avro format defines a set of data types whose names differ from 
the names of the corresponding Impala
+        data types. If you are preparing Avro files using other Hadoop 
components such as Pig or MapReduce, you
+        might need to work with the type names defined by Avro. The following 
figure lists the Avro-defined types
+        and the equivalent types in Impala.
+      </p>
+
+<codeblock><![CDATA[Primitive Types (Avro -> Impala)
+--------------------------------
+STRING -> STRING
+STRING -> CHAR
+STRING -> VARCHAR
+INT -> INT
+BOOLEAN -> BOOLEAN
+LONG ->  BIGINT
+FLOAT ->  FLOAT
+DOUBLE -> DOUBLE
+
+Logical Types
+-------------
+BYTES + logicalType = "decimal" -> DECIMAL
+
+Avro Types with No Impala Equivalent
+------------------------------------
+RECORD, MAP, ARRAY, UNION,  ENUM, FIXED, NULL
+
+Impala Types with No Avro Equivalent
+------------------------------------
+TIMESTAMP
+]]>
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
 
     </conbody>
   </concept>
 
+  <concept id="avro_performance">
+
+    <title>Query Performance for Impala Avro Tables</title>
+
+    <conbody>
+
+      <p>
+        In general, expect query performance with Avro tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <xref href="impala_parquet.xml#parquet"/>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/s3_block_splitting"/>
+
+    </conbody>
+  </concept>
 
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_batch_size.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_batch_size.xml 
b/docs/topics/impala_batch_size.xml
index 13a4b18..afd3c17 100644
--- a/docs/topics/impala_batch_size.xml
+++ b/docs/topics/impala_batch_size.xml
@@ -3,10 +3,15 @@
 <concept id="batch_size">
 
   <title>BATCH_SIZE Query Option</title>
+  <titlealts audience="PDF"><navtitle>BATCH_SIZE</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
     </metadata>
   </prolog>

[21/23] incubator-impala git commit: Update all impala* files to the latest CDH 5.9/5.10 versions.

Reply via email to