[6/7] incubator-impala git commit: New files needed to make PDF build happy.

jrussell Fri, 28 Oct 2016 17:34:11 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/1fcc8cee/docs/topics/impala_config_performance.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_config_performance.xml 
b/docs/topics/impala_config_performance.xml
new file mode 100644
index 0000000..837e63c
--- /dev/null
+++ b/docs/topics/impala_config_performance.xml
@@ -0,0 +1,291 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="config_performance">
+
+  <title>Post-Installation Configuration for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p id="p_24">
+      This section describes the mandatory and recommended configuration 
settings for Impala. If Impala is
+      installed using Cloudera Manager, some of these configurations are 
completed automatically; you must still
+      configure short-circuit reads manually. If you installed Impala without 
Cloudera Manager, or if you want to
+      customize your environment, consider making the changes described in 
this topic.
+    </p>
+
+    <p>
+<!-- Could conref this paragraph from ciiu_install.xml. -->
+      In some cases, depending on the level of Impala, CDH, and Cloudera 
Manager, you might need to add particular
+      component configuration details in one of the free-form fields on the 
Impala configuration pages within
+      Cloudera Manager. <ph 
conref="../shared/impala_common.xml#common/safety_valve"/>
+    </p>
+
+    <ul>
+      <li>
+        You must enable short-circuit reads, whether or not Impala was 
installed through Cloudera Manager. This
+        setting goes in the Impala configuration settings, not the Hadoop-wide 
settings.
+      </li>
+
+      <li>
+        If you installed Impala in an environment that is not managed by 
Cloudera Manager, you must enable block
+        location tracking, and you can optionally enable native checksumming 
for optimal performance.
+      </li>
+
+      <li>
+        If you deployed Impala using Cloudera Manager see
+        <xref href="impala_perf_testing.xml#performance_testing"/> to confirm 
proper configuration.
+      </li>
+    </ul>
+
+    <section id="section_fhq_wyv_ls">
+      <title>Mandatory: Short-Circuit Reads</title>
+      <p> Enabling short-circuit reads allows Impala to read local data 
directly
+        from the file system. This removes the need to communicate through the
+        DataNodes, improving performance. This setting also minimizes the 
number
+        of additional copies of data. Short-circuit reads requires
+          <codeph>libhadoop.so</codeph>
+        <!-- This link went stale. Not obvious how to keep it in sync with 
whatever Hadoop CDH is using behind the scenes. So hide the link for now. -->
+        <!--        (the <xref 
href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html"; 
scope="external" format="html">Hadoop Native Library</xref>) -->
+        (the Hadoop Native Library) to be accessible to both the server and the
+        client. <codeph>libhadoop.so</codeph> is not available if you have
+        installed from a tarball. You must install from an
+        <codeph>.rpm</codeph>, <codeph>.deb</codeph>, or parcel to use
+        short-circuit local reads. <note> If you use Cloudera Manager, you can
+          enable short-circuit reads through a checkbox in the user interface
+          and that setting takes effect for Impala as well. </note>
+      </p>
+      <p> Cloudera strongly recommends using Impala with CDH 4.2 or higher,
+        ideally the latest 4.x release. Impala does support short-circuit reads
+        with CDH 4.1, but for best performance, upgrade to CDH 4.3 or higher.
+        The process of configuring short-circuit reads varies according to 
which
+        version of CDH you are using. Choose the procedure that is appropriate
+        for your environment. </p>
+      <p>
+        <b>To configure DataNodes for short-circuit reads with CDH 4.2 or
+          higher:</b>
+      </p>
+      <ol id="ol_qlq_wyv_ls">
+        <li id="copy_config_files"> Copy the client
+            <codeph>core-site.xml</codeph> and <codeph>hdfs-site.xml</codeph>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <codeph>/etc/impala/conf</codeph>. </li>
+        <li>
+          <indexterm audience="Cloudera"
+            >dfs.client.read.shortcircuit</indexterm>
+          <indexterm audience="Cloudera">dfs.domain.socket.path</indexterm>
+          <indexterm audience="Cloudera"
+            >dfs.client.file-block-storage-locations.timeout.millis</indexterm>
+          On all Impala nodes, configure the following properties in <!-- 
Exact timing is unclear, since we say farther down to copy 
/etc/hadoop/conf/hdfs-site.xml to /etc/impala/conf.
+     Which wouldn't work if we already modified the Impala version of the file 
here. Not to mention that this
+     doesn't take the CM interface into account, where these /etc files might 
not exist in those locations. -->
+          <!--          <codeph>/etc/impala/conf/hdfs-site.xml</codeph> as 
shown: -->
+          Impala's copy of <codeph>hdfs-site.xml</codeph> as shown: 
<codeblock>&lt;property&gt;
+    &lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.domain.socket.path&lt;/name&gt;
+    &lt;value&gt;/var/run/hdfs-sockets/dn&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    
&lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
+    &lt;value&gt;10000&lt;/value&gt;
+&lt;/property&gt;</codeblock>
+          <!-- Former socket.path value:    
&lt;value&gt;/var/run/hadoop-hdfs/dn._PORT&lt;/value&gt; -->
+          <!--
+          <note>
+            The text <codeph>_PORT</codeph> appears just as shown; you do not 
need to
+            substitute a number.
+          </note>
+-->
+        </li>
+        <li>
+          <p> If <codeph>/var/run/hadoop-hdfs/</codeph> is group-writable, make
+            sure its group is <codeph>root</codeph>. </p>
+          <note> If you are also going to enable block location tracking, you
+            can skip copying configuration files and restarting DataNodes and 
go
+            straight to <xref 
href="#config_performance/block_location_tracking"
+             >Optional: Block Location Tracking</xref>.
+            Configuring short-circuit reads and block location tracking require
+            the same process of copying files and restarting services, so you
+            can complete that process once when you have completed all
+            configuration changes. Whether you copy files and restart services
+            now or during configuring block location tracking, short-circuit
+            reads are not enabled until you complete those final steps. </note>
+        </li>
+        <li id="restart_all_datanodes"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+      <p>
+        <b>To configure DataNodes for short-circuit reads with CDH 4.1:</b>
+      </p>
+      <!-- Repeated twice, turn into a conref. -->
+      <note> Cloudera strongly recommends using Impala with CDH 4.2 or higher,
+        ideally the latest 4.x release. Impala does support short-circuit reads
+        with CDH 4.1, but for best performance, upgrade to CDH 4.3 or higher.
+        The process of configuring short-circuit reads varies according to 
which
+        version of CDH you are using. Choose the procedure that is appropriate
+        for your environment. </note>
+      <ol id="ol_cqq_wyv_ls">
+        <li> Enable short-circuit reads by adding settings to the Impala
+            <codeph>core-site.xml</codeph> file. <ul id="ul_a5q_wyv_ls">
+            <li> If you installed Impala using Cloudera Manager, short-circuit
+              reads should be properly configured, but you can review the
+              configuration by checking the contents of
+                theÂ <codeph>core-site.xml</codeph>Â file, which is installed 
at
+                <codeph>/etc/impala/conf</codeph>Â by default. </li>
+            <li> If you installed using packages, instead of using Cloudera
+              Manager, create the <codeph>core-site.xml</codeph>Â file. This 
can
+              be easily done by copying
+              theÂ <codeph>core-site.xml</codeph>Â client configuration file 
from
+              another machine that is running Hadoop services. This file must 
be
+              copied to the Impala configuration directory. The Impala
+              configuration directory is set by
+                theÂ <codeph>IMPALA_CONF_DIR</codeph> environment variable and 
is
+              by default <codeph>/etc/impala/conf</codeph>. To confirm the
+              Impala configuration directory, check
+                theÂ <codeph>IMPALA_CONF_DIR</codeph> environment variable 
value.
+                <note> If the Impala configuration directory does not exist,
+                create it and then add theÂ <codeph>core-site.xml</codeph> 
file.
+              </note>
+            </li>
+          </ul> Add the following to the <codeph>core-site.xml</codeph> file: 
<codeblock>&lt;property&gt;
+   &lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
+ Â Â &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;</codeblock>
+          <note> For an installation managed by Cloudera Manager, specify these
+            settings in the Impala dialogs, in the options field for HDFS. <ph
+              conref="../shared/impala_common.xml#common/safety_valve" />
+          </note>
+        </li>
+        <li> For each DataNode, enable access by adding the following to
+            theÂ <codeph>hdfs-site.xml</codeph> file: <codeblock 
rev="1.3.0">&lt;property&gt;
+   &lt;name&gt;dfs.client.use.legacy.blockreader.local&lt;/name&gt;
+   &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;dfs.datanode.data.dir.perm&lt;/name&gt;
+   &lt;value&gt;750&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+   &lt;name&gt;dfs.block.local-path-access.user&lt;/name&gt;
+   &lt;value&gt;impala&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    
&lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
+    &lt;value&gt;10000&lt;/value&gt;
+&lt;/property&gt;</codeblock>
+          <note> In the preceding example,
+              theÂ <codeph>dfs.block.local-path-access.user</codeph>Â is the 
user
+            running the <codeph>impalad</codeph>Â process. By default, that
+            account isÂ <codeph>impala</codeph>. </note>
+        </li>
+        <li> UseÂ  <codeph>usermod</codeph> Â to add users requiring local 
block
+          access to the appropriate HDFS group. For example, if you
+            assignedÂ <codeph>impala</codeph> to the
+            <codeph>dfs.block.local-path-access.user</codeph> Â property, you
+          would add <codeph>impala</codeph> Â to the hadoop HDFS group: 
<codeblock>$ usermod -a -G hadoop impala</codeblock>
+          <note> The default HDFS group is <codeph>hadoop</codeph>, but it is
+            possible to have an environment configured to use an alternate
+            group. To find the configured HDFS group name using the Cloudera
+            Manager Admin Console: <ol id="ol_km4_4bc_nr">
+              <li>Go to the HDFS service.</li>
+              <li
+                conref="../shared/cm_common_elements.xml#cm/config_edit" />
+              <li>Click <menucascade>
+                  <uicontrol>Scope</uicontrol>
+                  <uicontrol><varname>HDFS service name</varname>
+                    (Service-Wide)</uicontrol>
+                </menucascade>.</li>
+              <li>Click <menucascade>
+                  <uicontrol>Category</uicontrol>
+                  <uicontrol>Advanced</uicontrol>
+                </menucascade>.</li>
+              <li>The <uicontrol>Shared Hadoop Group Name</uicontrol> property
+                contains the group name.</li>
+            </ol></note>
+          <note> If you are going to enable block location tracking, you can
+            skip copying configuration files and restarting DataNodes and go
+            straight to <xref 
href="#config_performance/block_location_tracking"/>.
+            Configuring short-circuit reads and block
+            location tracking require the same process of copying files and
+            restarting services, so you can complete that process once when you
+            have completed all configuration changes. Whether you copy files 
and
+            restart services now or during configuring block location tracking,
+            short-circuit reads are not enabled until you complete those final
+            steps. </note>
+        </li>
+        <li conref="#config_performance/copy_config_files" />
+        <li conref="#config_performance/restart_all_datanodes" />
+      </ol>
+    </section>
+
+    <section id="block_location_tracking">
+
+      <title>Mandatory: Block Location Tracking</title>
+
+      <p>
+        Enabling block location metadata allows Impala to know which disk data 
blocks are located on, allowing
+        better utilization of the underlying disks. Impala will not start 
unless this setting is enabled.
+      </p>
+
+      <p>
+        <b>To enable block location tracking:</b>
+      </p>
+
+      <ol>
+        <li>
+          For each DataNode, adding the following to theÂ 
<codeph>hdfs-site.xml</codeph> file:
+<codeblock>&lt;property&gt;
+  &lt;name&gt;dfs.datanode.hdfs-blocks-metadata.enabled&lt;/name&gt;
+  &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt; </codeblock>
+        </li>
+
+        <li conref="#config_performance/copy_config_files"/>
+
+        <li conref="#config_performance/restart_all_datanodes"/>
+      </ol>
+    </section>
+
+    <section id="native_checksumming">
+
+      <title>Optional: Native Checksumming</title>
+
+      <p>
+        Enabling native checksumming causes Impala to use an optimized native 
library for computing checksums, if
+        that library is available.
+      </p>
+
+      <p id="p_29">
+        <b>To enable native checksumming:</b>
+      </p>
+
+      <p>
+        If you installed CDH from packages, the native checksumming library is 
installed and setup correctly. In
+        such a case, no additional steps are required. Conversely, if you 
installed by other means, such as with
+        tarballs, native checksumming may not be available due to missing 
shared objects. Finding the message
+        "<codeph>Unable to load native-hadoop library for your platform... 
using builtin-java classes where
+        applicable</codeph>" in the Impala logs indicates native checksumming 
may be unavailable. To enable native
+        checksumming, you must build and install <codeph>libhadoop.so</codeph> 
(the
+        <!-- Another instance of stale link. -->
+        <!-- <xref 
href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html"; 
scope="external" format="html">Hadoop Native Library</xref>). -->
+        Hadoop Native Library).
+      </p>
+    </section>
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/1fcc8cee/docs/topics/impala_connecting.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_connecting.xml 
b/docs/topics/impala_connecting.xml
new file mode 100644
index 0000000..354e698
--- /dev/null
+++ b/docs/topics/impala_connecting.xml
@@ -0,0 +1,202 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="connecting">
+
+  <title>Connecting to impalad through impala-shell</title>
+  <titlealts audience="PDF"><navtitle>Connecting to 
impalad</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="Network"/>
+      <data name="Category" value="DataNode"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+<!--
+TK: This would be a good theme for a tutorial topic.
+Lots of nuances to illustrate through sample code.
+-->
+
+    <p>
+      Within an <cmdname>impala-shell</cmdname> session, you can only issue 
queries while connected to an instance
+      of the <cmdname>impalad</cmdname> daemon. You can specify the connection 
information:
+      <ul>
+        <li>
+          Through command-line options when you run the 
<cmdname>impala-shell</cmdname> command.
+        </li>
+        <li>
+          Through a configuration file that is read when you run the 
<cmdname>impala-shell</cmdname> command.
+        </li>
+        <li>
+          During an <cmdname>impala-shell</cmdname> session, by issuing a 
<codeph>CONNECT</codeph> command.
+        </li>
+      </ul>
+      See <xref href="impala_shell_options.xml"/> for the command-line and 
configuration file options you can use.
+    </p>
+
+    <p>
+      You can connect to any DataNode where an instance of 
<cmdname>impalad</cmdname> is running,
+      and that host coordinates the execution of all queries sent to it.
+    </p>
+
+    <p>
+      For simplicity during development, you might always connect to the same 
host, perhaps running <cmdname>impala-shell</cmdname> on
+      the same host as <cmdname>impalad</cmdname> and specifying the hostname 
as <codeph>localhost</codeph>.
+    </p>
+
+    <p>
+      In a production environment, you might enable load balancing, in which 
you connect to specific host/port combination
+      but queries are forwarded to arbitrary hosts. This technique spreads the 
overhead of acting as the coordinator
+      node among all the DataNodes in the cluster. See <xref 
href="impala_proxy.xml"/> for details.
+    </p>
+
+    <p>
+      <b>To connect the Impala shell during shell startup:</b>
+    </p>
+
+    <ol>
+      <li>
+        Locate the hostname of a DataNode within the cluster that is running 
an instance of the
+        <cmdname>impalad</cmdname> daemon. If that DataNode uses a non-default 
port (something
+        other than port 21000) for <cmdname>impala-shell</cmdname> 
connections, find out the
+        port number also.
+      </li>
+
+      <li>
+        Use the <codeph>-i</codeph> option to the
+        <cmdname>impala-shell</cmdname> interpreter to specify the connection 
information for
+        that instance of <cmdname>impalad</cmdname>:
+<codeblock>
+# When you are logged into the same machine running impalad.
+# The prompt will reflect the current hostname.
+$ impala-shell
+
+# When you are logged into the same machine running impalad.
+# The host will reflect the hostname 'localhost'.
+$ impala-shell -i localhost
+
+# When you are logged onto a different host, perhaps a client machine
+# outside the Hadoop cluster.
+$ impala-shell -i <varname>some.other.hostname</varname>
+
+# When you are logged onto a different host, and impalad is listening
+# on a non-default port. Perhaps a load balancer is forwarding requests
+# to a different host/port combination behind the scenes.
+$ impala-shell -i 
<varname>some.other.hostname</varname>:<varname>port_number</varname>
+</codeblock>
+      </li>
+    </ol>
+
+    <p>
+      <b>To connect the Impala shell after shell startup:</b>
+    </p>
+
+    <ol>
+      <li>
+        Start the Impala shell with no connection:
+<codeblock>$ impala-shell</codeblock>
+        <p>
+          You should see a prompt like the following:
+        </p>
+<codeblock>Welcome to the Impala shell. Press TAB twice to see a list of 
available commands.
+
+Copyright (c) <varname>year</varname> Cloudera, Inc. All rights reserved.
+
+<ph conref="../shared/ImpalaVariables.xml#impala_vars/ShellBanner"/>
+[Not connected] &gt; </codeblock>
+      </li>
+
+      <li>
+        Locate the hostname of a DataNode within the cluster that is running 
an instance of the
+        <cmdname>impalad</cmdname> daemon. If that DataNode uses a non-default 
port (something
+        other than port 21000) for <cmdname>impala-shell</cmdname> 
connections, find out the
+        port number also.
+      </li>
+
+      <li>
+        Use the <codeph>connect</codeph> command to connect to an Impala 
instance. Enter a command of the form:
+<codeblock>[Not connected] &gt; connect <varname>impalad-host</varname>
+[<varname>impalad-host</varname>:21000] &gt;</codeblock>
+        <note>
+          Replace <varname>impalad-host</varname> with the hostname you have 
configured for any DataNode running
+          Impala in your environment. The changed prompt indicates a 
successful connection.
+        </note>
+      </li>
+    </ol>
+
+    <p>
+      <b>To start <cmdname>impala-shell</cmdname> in a specific database:</b>
+    </p>
+
+    <p>
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <cmdname>impalad</cmdname> daemon.
+    </p>
+
+    <ol>
+      <li>
+        Find the name of the database containing the relevant tables, views, 
and so
+        on that you want to operate on.
+      </li>
+
+      <li>
+        Use the <codeph>-d</codeph> option to the
+        <cmdname>impala-shell</cmdname> interpreter to connect and immediately
+        switch to the specified database, without the need for a 
<codeph>USE</codeph>
+        statement or fully qualified names:
+<codeblock>
+# Subsequent queries with unqualified names operate on
+# tables, views, and so on inside the database named 'staging'.
+$ impala-shell -i localhost -d staging
+
+# It is common during development, ETL, benchmarking, and so on
+# to have different databases containing the same table names
+# but with different contents or layouts.
+$ impala-shell -i localhost -d parquet_snappy_compression
+$ impala-shell -i localhost -d parquet_gzip_compression
+</codeblock>
+      </li>
+    </ol>
+
+    <p>
+      <b>To run one or several statements in non-interactive mode:</b>
+    </p>
+
+    <p>
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <cmdname>impalad</cmdname> daemon.
+    </p>
+
+    <ol>
+      <li>
+        Construct a statement, or a file containing a sequence of statements,
+        that you want to run in an automated way, without typing or copying
+        and pasting each time.
+      </li>
+
+      <li>
+        Invoke <cmdname>impala-shell</cmdname> with the <codeph>-q</codeph> 
option to run a single statement, or
+        the <codeph>-f</codeph> option to run a sequence of statements from a 
file.
+        The <cmdname>impala-shell</cmdname> command returns immediately, 
without going into
+        the interactive interpreter.
+<codeblock>
+# A utility command that you might run while developing shell scripts
+# to manipulate HDFS files.
+$ impala-shell -i localhost -d database_of_interest -q 'show tables'
+
+# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
+# can go into a file to make the setup process repeatable.
+$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
+</codeblock>
+      </li>
+    </ol>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/1fcc8cee/docs/topics/impala_delegation.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_delegation.xml 
b/docs/topics/impala_delegation.xml
new file mode 100644
index 0000000..0d59761
--- /dev/null
+++ b/docs/topics/impala_delegation.xml
@@ -0,0 +1,88 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="delegation">
+
+  <title>Configuring Impala Delegation for Hue and BI Tools</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Security"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Authentication"/>
+      <data name="Category" value="Delegation"/>
+      <data name="Category" value="Hue"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+<!--
+      When users connect to Impala directly through the 
<cmdname>impala-shell</cmdname> interpreter, the Sentry
+      authorization framework determines what actions they can take and what 
data they can see.
+-->
+      When users submit Impala queries through a separate application, such as 
Hue or a business intelligence tool,
+      typically all requests are treated as coming from the same user. In 
Impala 1.2 and higher, authentication is
+      extended by a new feature that allows applications to pass along 
credentials for the users that connect to
+      them (known as <q>delegation</q>), and issue Impala queries with the 
privileges for those users. Currently,
+      the delegation feature is available only for Impala queries submitted 
through application interfaces such as
+      Hue and BI tools; for example, Impala cannot issue queries using the 
privileges of the HDFS user.
+    </p>
+
+    <p>
+      The delegation feature is enabled by a startup option for 
<cmdname>impalad</cmdname>:
+      <codeph>--authorized_proxy_user_config</codeph>. When you specify this 
option, users whose names you specify
+      (such as <codeph>hue</codeph>) can delegate the execution of a query to 
another user. The query runs with the
+      privileges of the delegated user, not the original user such as 
<codeph>hue</codeph>. The name of the
+      delegated user is passed using the HiveServer2 configuration property 
<codeph>impala.doas.user</codeph>.
+    </p>
+
+    <p>
+      You can specify a list of users that the application user can delegate 
to, or <codeph>*</codeph> to allow a
+      superuser to delegate to any other user. For example:
+    </p>
+
+<codeblock>impalad --authorized_proxy_user_config 'hue=user1,user2;admin=*' 
...</codeblock>
+
+    <note>
+      Make sure to use single quotes or escape characters to ensure that any 
<codeph>*</codeph> characters do not
+      undergo wildcard expansion when specified in command-line arguments.
+    </note>
+
+    <p>
+      See <xref href="impala_config_options.xml#config_options"/> for details 
about adding or changing
+      <cmdname>impalad</cmdname> startup options. See
+      <xref 
href="http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/";
 scope="external" format="html">this
+      Cloudera blog post</xref> for background information about the 
delegation capability in HiveServer2.
+    </p>
+
+    <p>
+      To set up authentication for the delegated users:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          On the server side, configure either user/password authentication 
through LDAP, or Kerberos
+          authentication, for all the delegated users. See <xref 
href="impala_ldap.xml#ldap"/> or
+          <xref href="impala_kerberos.xml#kerberos"/> for details.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          On the client side, follow the instructions in the <q>Using User 
Name and Password</q> section in the
+          <xref 
href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/Connectors/PDF/Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf";
 scope="external" format="pdf">ODBC
+          driver installation guide</xref>. Then search for <q>delegation</q> 
in that same installation guide to
+          learn about the <uicontrol>Delegation UID</uicontrol> field and 
<codeph>DelegationUID</codeph> configuration keyword to enable the delegation 
feature for
+          ODBC-based BI tools.
+        </p>
+      </li>
+    </ul>
+
+  </conbody>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/1fcc8cee/docs/topics/impala_development.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_development.xml 
b/docs/topics/impala_development.xml
new file mode 100644
index 0000000..a2eef16
--- /dev/null
+++ b/docs/topics/impala_development.xml
@@ -0,0 +1,229 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="intro_dev">
+
+  <title>Developing Impala Applications</title>
+  <titlealts audience="PDF"><navtitle>Developing 
Applications</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The core development language with Impala is SQL. You can also use Java 
or other languages to interact with
+      Impala through the standard JDBC and ODBC interfaces used by many 
business intelligence tools. For
+      specialized kinds of analysis, you can supplement the SQL built-in 
functions by writing
+      <xref href="impala_udf.xml#udfs">user-defined functions (UDFs)</xref> in 
C++ or Java.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="intro_sql">
+
+    <title>Overview of the Impala SQL Dialect</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        The Impala SQL dialect is highly compatible with the SQL syntax used 
in the Apache Hive component (HiveQL). As
+        such, it is familiar to users who are already familiar with running 
SQL queries on the Hadoop
+        infrastructure. Currently, Impala SQL supports a subset of HiveQL 
statements, data types, and built-in
+        functions. Impala also includes additional built-in functions for 
common industry features, to simplify
+        porting SQL from non-Hadoop systems.
+      </p>
+
+      <p>
+        For users coming to Impala from traditional database or data 
warehousing backgrounds, the following aspects of the SQL dialect
+        might seem familiar:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            The <codeph>SELECT</codeph> statement includes familiar clauses 
such as <codeph>WHERE</codeph>,
+            <codeph>GROUP BY</codeph>, <codeph>ORDER BY</codeph>, and 
<codeph>WITH</codeph>.
+            You will find familiar notions such as
+            <xref href="impala_joins.xml#joins">joins</xref>, <xref 
href="impala_functions.xml#builtins">built-in
+            functions</xref> for processing strings, numbers, and dates,
+            <xref 
href="impala_aggregate_functions.xml#aggregate_functions">aggregate 
functions</xref>,
+            <xref href="impala_subqueries.xml#subqueries">subqueries</xref>, 
and
+            <xref href="impala_operators.xml#comparison_operators">comparison 
operators</xref>
+            such as <codeph>IN()</codeph> and <codeph>BETWEEN</codeph>.
+            The <codeph>SELECT</codeph> statement is the place where SQL 
standards compliance is most important.
+          </p>
+        </li>
+
+        <li>
+          <p>
+          From the data warehousing world, you will recognize the notion of
+          <xref href="impala_partitioning.xml#partitioning">partitioned 
tables</xref>.
+          One or more columns serve as partition keys, and the data is 
physically arranged so that
+          queries that refer to the partition key columns in the 
<codeph>WHERE</codeph> clause
+          can skip partitions that do not match the filter conditions. For 
example, if you have 10
+          years worth of data and use a clause such as <codeph>WHERE year = 
2015</codeph>,
+          <codeph>WHERE year &gt; 2010</codeph>, or <codeph>WHERE year IN 
(2014, 2015)</codeph>,
+          Impala skips all the data for non-matching years, greatly reducing 
the amount of I/O
+          for the query.
+          </p>
+        </li>
+
+        <li rev="1.2">
+          <p>
+          In Impala 1.2 and higher, <xref 
href="impala_udf.xml#udfs">UDFs</xref> let you perform custom comparisons
+          and transformation logic during <codeph>SELECT</codeph> and 
<codeph>INSERT...SELECT</codeph> statements.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        For users coming to Impala from traditional database or data 
warehousing backgrounds, the following aspects of the SQL dialect
+        might require some learning and practice for you to become proficient 
in the Hadoop environment:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+          Impala SQL is focused on queries and includes relatively little DML. 
There is no <codeph>UPDATE</codeph>
+          or <codeph>DELETE</codeph> statement. Stale data is typically 
discarded (by <codeph>DROP TABLE</codeph>
+          or <codeph>ALTER TABLE ... DROP PARTITION</codeph> statements) or 
replaced (by <codeph>INSERT
+          OVERWRITE</codeph> statements).
+          </p>
+        </li>
+
+        <li>
+          <p>
+          All data creation is done by <codeph>INSERT</codeph> statements, 
which typically insert data in bulk by
+          querying from other tables. There are two variations, <codeph>INSERT 
INTO</codeph> which appends to the
+          existing data, and <codeph>INSERT OVERWRITE</codeph> which replaces 
the entire contents of a table or
+          partition (similar to <codeph>TRUNCATE TABLE</codeph> followed by a 
new <codeph>INSERT</codeph>). There
+          is no <codeph>INSERT ... VALUES</codeph> syntax to insert a single 
row.
+          </p>
+        </li>
+
+        <li>
+          <p>
+          You often construct Impala table definitions and data files in some 
other environment, and then attach
+          Impala so that it can run real-time queries. The same data files and 
table metadata are shared with other
+          components of the Hadoop ecosystem. In particular, Impala can access 
tables created by Hive or data
+          inserted by Hive, and Hive can access tables and data produced by 
Impala. Many other Hadoop components
+          can write files in formats such as Parquet and Avro, that can then 
be queried by Impala.
+          </p>
+        </li>
+
+        <li>
+          <p>
+          Because Hadoop and Impala are focused on data warehouse-style 
operations on large data sets, Impala SQL
+          includes some idioms that you might find in the import utilities for 
traditional database systems. For
+          example, you can create a table that reads comma-separated or 
tab-separated text files, specifying the
+          separator in the <codeph>CREATE TABLE</codeph> statement. You can 
create <b>external tables</b> that read
+          existing data files but do not move or transform them.
+          </p>
+        </li>
+
+        <li>
+          <p>
+          Because Impala reads large quantities of data that might not be 
perfectly tidy and predictable, it does
+          not impose length constraints on string data types. For example, you 
can define a database column as
+          <codeph>STRING</codeph> with unlimited length, rather than 
<codeph>CHAR(1)</codeph> or
+          <codeph>VARCHAR(64)</codeph>. <ph rev="2.0.0">(Although in Impala 
2.0 and later, you can also use
+          length-constrained <codeph>CHAR</codeph> and 
<codeph>VARCHAR</codeph> types.)</ph>
+          </p>
+        </li>
+
+      </ul>
+
+      <p>
+        <b>Related information:</b> <xref href="impala_langref.xml#langref"/>, 
especially
+        <xref href="impala_langref_sql.xml#langref_sql"/> and <xref 
href="impala_functions.xml#builtins"/>
+      </p>
+    </conbody>
+  </concept>
+
+<!-- Bunch of potential concept topics for future consideration. Major areas 
of Impala modelled on areas of discussion for Oracle Database, and distributed 
databases in general. -->
+
+  <concept id="intro_datatypes" audience="Cloudera">
+
+    <title>Overview of Impala SQL Data Types</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_network" audience="Cloudera">
+
+    <title>Overview of Impala Network Topology</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_cluster" audience="Cloudera">
+
+    <title>Overview of Impala Cluster Topology</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_apis">
+
+    <title>Overview of Impala Programming Interfaces</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="JDBC"/>
+      <data name="Category" value="ODBC"/>
+      <data name="Category" value="Hue"/>
+    </metadata>
+  </prolog>
+
+    <conbody>
+
+      <p>
+        You can connect and submit requests to the Impala daemons through:
+      </p>
+
+      <ul>
+        <li>
+          The <codeph><xref 
href="impala_impala_shell.xml#impala_shell">impala-shell</xref></codeph> 
interactive
+          command interpreter.
+        </li>
+
+        <li>
+          The <xref href="http://gethue.com/"; scope="external" 
format="html">Hue</xref> web-based user interface.
+        </li>
+
+        <li>
+          <xref href="impala_jdbc.xml#impala_jdbc">JDBC</xref>.
+        </li>
+
+        <li>
+          <xref href="impala_odbc.xml#impala_odbc">ODBC</xref>.
+        </li>
+      </ul>
+
+      <p>
+        With these options, you can use Impala in heterogeneous environments, 
with JDBC or ODBC applications
+        running on non-Linux platforms. You can also use Impala on combination 
with various Business Intelligence
+        tools that use the JDBC and ODBC interfaces.
+      </p>
+
+      <p>
+        Each <codeph>impalad</codeph> daemon process, running on separate 
nodes in a cluster, listens to
+        <xref href="impala_ports.xml#ports">several ports</xref> for incoming 
requests. Requests from
+        <codeph>impala-shell</codeph> and Hue are routed to the 
<codeph>impalad</codeph> daemons through the same
+        port. The <codeph>impalad</codeph> daemons listen on separate ports 
for JDBC and ODBC requests.
+      </p>
+    </conbody>
+  </concept>
+</concept>

[6/7] incubator-impala git commit: New files needed to make PDF build happy.

Reply via email to