[38/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:23 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_config_options.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_config_options.xml 
b/docs/topics/impala_config_options.xml
new file mode 100644
index 0000000..a08aafa
--- /dev/null
+++ b/docs/topics/impala_config_options.xml
@@ -0,0 +1,593 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="config_options">
+
+  <title>Modifying Impala Startup Options</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">defaults file</indexterm>
+
+      <indexterm audience="Cloudera">configuration file</indexterm>
+
+      <indexterm audience="Cloudera">options</indexterm>
+
+      <indexterm audience="Cloudera">IMPALA_STATE_STORE_PORT</indexterm>
+
+      <indexterm audience="Cloudera">IMPALA_BACKEND_PORT</indexterm>
+
+      <indexterm audience="Cloudera">IMPALA_LOG_DIR</indexterm>
+
+      <indexterm audience="Cloudera">IMPALA_STATE_STORE_ARGS</indexterm>
+
+      <indexterm audience="Cloudera">IMPALA_SERVER_ARGS</indexterm>
+
+      <indexterm audience="Cloudera">ENABLE_CORE_DUMPS</indexterm>
+
+      <indexterm audience="Cloudera">core dumps</indexterm>
+
+      <indexterm audience="Cloudera">restarting services</indexterm>
+
+      <indexterm audience="Cloudera">services</indexterm>
+      The configuration options for the Impala-related daemons let you choose 
which hosts and
+      ports to use for the services that run on a single host, specify 
directories for logging,
+      control resource usage and security, and specify other aspects of the 
Impala software.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+  <concept id="config_options_cm">
+
+    <title>Configuring Impala Startup Options through Cloudera Manager</title>
+
+    <conbody>
+
+      <p>
+        If you manage your cluster through Cloudera Manager, configure the 
settings for all the
+        Impala-related daemons by navigating to this page:
+        
<menucascade><uicontrol>Clusters</uicontrol><uicontrol>Impala</uicontrol><uicontrol>Configuration</uicontrol><uicontrol>View
+        and Edit</uicontrol></menucascade>. See the Cloudera Manager 
documentation for
+        <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_impala_service.html";
 scope="external" format="html">instructions
+        about how to configure Impala through Cloudera Manager</xref>.
+      </p>
+
+      <p>
+        If the Cloudera Manager interface does not yet have a form field for a 
newly added
+        option, or if you need to use special options for debugging and 
troubleshooting, the
+        <uicontrol>Advanced</uicontrol> option page for each daemon includes 
one or more fields
+        where you can enter option names directly.
+        <ph conref="../shared/impala_common.xml#common/safety_valve"/> There 
is also a free-form
+        field for query options, on the top-level <uicontrol>Impala 
Daemon</uicontrol> options
+        page.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="config_options_noncm">
+
+    <title>Configuring Impala Startup Options through the Command Line</title>
+
+    <conbody>
+
+      <p>
+        When you run Impala in a non-Cloudera Manager environment, the Impala 
server,
+        statestore, and catalog services start up using values provided in a 
defaults file,
+        <filepath>/etc/default/impala</filepath>.
+      </p>
+
+      <p>
+        This file includes information about many resources used by Impala. 
Most of the defaults
+        included in this file should be effective in most cases. For example, 
typically you
+        would not change the definition of the <codeph>CLASSPATH</codeph> 
variable, but you
+        would always set the address used by the statestore server. Some of 
the content you
+        might modify includes:
+      </p>
+
+<!-- Note: Update the following example for each release with the associated 
lines from /etc/default/impala
+           from a non-CM-managed system. -->
+
+<codeblock rev="ver">IMPALA_STATE_STORE_HOST=127.0.0.1
+IMPALA_STATE_STORE_PORT=24000
+IMPALA_BACKEND_PORT=22000
+IMPALA_LOG_DIR=/var/log/impala
+IMPALA_CATALOG_SERVICE_HOST=...
+IMPALA_STATE_STORE_HOST=...
+
+export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
+IMPALA_SERVER_ARGS=" \
+-log_dir=${IMPALA_LOG_DIR} \
+-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
+-state_store_port=${IMPALA_STATE_STORE_PORT} \
+-use_statestore \
+-state_store_host=${IMPALA_STATE_STORE_HOST} \
+-be_port=${IMPALA_BACKEND_PORT}"
+export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
+
+      <p>
+        To use alternate values, edit the defaults file, then restart all the 
Impala-related
+        services so that the changes take effect. Restart the Impala server 
using the following
+        commands:
+      </p>
+
+<codeblock>$ sudo service impala-server restart
+Stopping Impala Server:                                    [  OK  ]
+Starting Impala Server:                                    [  OK  ]</codeblock>
+
+      <p>
+        Restart the Impala statestore using the following commands:
+      </p>
+
+<codeblock>$ sudo service impala-state-store restart
+Stopping Impala State Store Server:                        [  OK  ]
+Starting Impala State Store Server:                        [  OK  ]</codeblock>
+
+      <p>
+        Restart the Impala catalog service using the following commands:
+      </p>
+
+<codeblock>$ sudo service impala-catalog restart
+Stopping Impala Catalog Server:                            [  OK  ]
+Starting Impala Catalog Server:                            [  OK  ]</codeblock>
+
+      <p>
+        Some common settings to change include:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Statestore address. Where practical, put the statestore on a 
separate host not
+            running the <cmdname>impalad</cmdname> daemon. In that recommended 
configuration,
+            the <cmdname>impalad</cmdname> daemon cannot refer to the 
statestore server using
+            the loopback address. If the statestore is hosted on a machine 
with an IP address of
+            192.168.0.27, change:
+          </p>
+<codeblock>IMPALA_STATE_STORE_HOST=127.0.0.1</codeblock>
+          <p>
+            to:
+          </p>
+<codeblock>IMPALA_STATE_STORE_HOST=192.168.0.27</codeblock>
+        </li>
+
+        <li rev="1.2">
+          <p>
+            Catalog server address (including both the hostname and the port 
number). Update the
+            value of the <codeph>IMPALA_CATALOG_SERVICE_HOST</codeph> 
variable. Cloudera
+            recommends the catalog server be on the same host as the 
statestore. In that
+            recommended configuration, the <cmdname>impalad</cmdname> daemon 
cannot refer to the
+            catalog server using the loopback address. If the catalog service 
is hosted on a
+            machine with an IP address of 192.168.0.27, add the following line:
+          </p>
+<codeblock>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</codeblock>
+          <p>
+            The <filepath>/etc/default/impala</filepath> defaults file 
currently does not define
+            an <codeph>IMPALA_CATALOG_ARGS</codeph> environment variable, but 
if you add one it
+            will be recognized by the service startup/shutdown script. Add a 
definition for this
+            variable to <filepath>/etc/default/impala</filepath> and add the 
option
+            
<codeph>-catalog_service_host=<varname>hostname</varname></codeph>. If the port 
is
+            different than the default 26000, also add the option
+            <codeph>-catalog_service_port=<varname>port</varname></codeph>.
+          </p>
+        </li>
+
+        <li id="mem_limit">
+          <p>
+            Memory limits. You can limit the amount of memory available to 
Impala. For example,
+            to allow Impala to use no more than 70% of system memory, change:
+          </p>
+<!-- Note: also needs to be updated for each release to reflect latest 
/etc/default/impala. -->
+<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} \
+    -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT}}</codeblock>
+          <p>
+            to:
+          </p>
+<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</codeblock>
+          <p>
+            You can specify the memory limit using absolute notation such as
+            <codeph>500m</codeph> or <codeph>2G</codeph>, or as a percentage 
of physical memory
+            such as <codeph>60%</codeph>.
+          </p>
+
+          <note>
+            Queries that exceed the specified memory limit are aborted. 
Percentage limits are
+            based on the physical memory of the machine and do not consider 
cgroups.
+          </note>
+        </li>
+
+        <li>
+          <p>
+            Core dump enablement. To enable core dumps on systems not managed 
by Cloudera
+            Manager, change:
+          </p>
+<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
+          <p>
+            to:
+          </p>
+<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</codeblock>
+          <p>
+            On systems managed by Cloudera Manager, enable the 
<uicontrol>Enable Core
+            Dump</uicontrol> setting for the Impala service.
+          </p>
+
+          <note 
conref="../shared/impala_common.xml#common/core_dump_considerations"/>
+        </li>
+
+        <li>
+          <p>
+            Authorization using the open source Sentry plugin. Specify the
+            <codeph>-server_name</codeph> and 
<codeph>-authorization_policy_file</codeph>
+            options as part of the <codeph>IMPALA_SERVER_ARGS</codeph> and
+            <codeph>IMPALA_STATE_STORE_ARGS</codeph> settings to enable the 
core Impala support
+            for authentication. See <xref 
href="impala_authorization.xml#secure_startup"/> for
+            details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Auditing for successful or blocked Impala queries, another aspect 
of security.
+            Specify the 
<codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
+            option and optionally the
+            
<codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
+            and <codeph>-abort_on_failed_audit_event</codeph> options as part 
of the
+            <codeph>IMPALA_SERVER_ARGS</codeph> settings, for each Impala 
node, to enable and
+            customize auditing. See <xref 
href="impala_auditing.xml#auditing"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Password protection for the Impala web UI, which listens on port 
25000 by default.
+            This feature involves adding some or all of the
+            <codeph>--webserver_password_file</codeph>,
+            <codeph>--webserver_authentication_domain</codeph>, and
+            <codeph>--webserver_certificate_file</codeph> options to the
+            <codeph>IMPALA_SERVER_ARGS</codeph> and 
<codeph>IMPALA_STATE_STORE_ARGS</codeph>
+            settings. See <xref 
href="impala_security_guidelines.xml#security_guidelines"/> for
+            details.
+          </p>
+        </li>
+
+        <li id="default_query_options">
+          <p rev="DOCS-677">
+            Another setting you might add to 
<codeph>IMPALA_SERVER_ARGS</codeph> is a
+            comma-separated list of query options and values:
+<codeblock>-default_query_options='<varname>option</varname>=<varname>value</varname>,<varname>option</varname>=<varname>value</varname>,...'
+</codeblock>
+            These options control the behavior of queries performed by this
+            <cmdname>impalad</cmdname> instance. The option values you specify 
here override the
+            default values for <xref 
href="impala_query_options.xml#query_options">Impala query
+            options</xref>, as shown by the <codeph>SET</codeph> statement in
+            <cmdname>impala-shell</cmdname>.
+          </p>
+        </li>
+
+<!-- Removing this reference now that the options are de-emphasized / 
desupported in CDH 5.5 / Impala 2.3 and up.
+        <li rev="1.2">
+          <p>
+          Options for resource management, in conjunction with the YARN 
component. These options include
+          <codeph>-enable_rm</codeph> and 
<codeph>-cgroup_hierarchy_path</codeph>.
+          <ph rev="1.4.0">Additional options to help fine-tune the resource 
estimates are
+          <codeph>-ârm_always_use_defaults</codeph>,
+          <codeph>-ârm_default_memory=<varname>size</varname></codeph>, and
+          <codeph>-ârm_default_cpu_cores</codeph>.</ph> For details about 
these options, see
+          <xref href="impala_resource_management.xml#rm_options"/>. See
+          <xref href="impala_resource_management.xml#resource_management"/> 
for information about resource
+          management in general.
+          </p>
+        </li>
+-->
+
+        <li>
+          <p>
+            During troubleshooting, <keyword keyref="support_org"/> might 
direct you to change other values,
+            particularly for <codeph>IMPALA_SERVER_ARGS</codeph>, to work 
around issues or
+            gather debugging information.
+          </p>
+        </li>
+      </ul>
+
+<!-- Removing this reference now that the options are de-emphasized / 
desupported in CDH 5.5 / Impala 2.3 and up.
+      <p 
conref="impala_resource_management.xml#rm_options/resource_management_impalad_options"/>
+-->
+
+      <note>
+        <p>
+          These startup options for the <cmdname>impalad</cmdname> daemon are 
different from the
+          command-line options for the <cmdname>impala-shell</cmdname> 
command. For the
+          <cmdname>impala-shell</cmdname> options, see
+          <xref href="impala_shell_options.xml#shell_options"/>.
+        </p>
+      </note>
+
+      <p audience="Cloudera" outputclass="toc inpage"/>
+
+    </conbody>
+
+    <concept audience="Cloudera" id="config_options_impalad_details">
+
+      <title>Configuration Options for impalad Daemon</title>
+
+      <conbody>
+
+        <p>
+          Some common settings to change include:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              Statestore address. Where practical, put the statestore on a 
separate host not
+              running the <cmdname>impalad</cmdname> daemon. In that 
recommended configuration,
+              the <cmdname>impalad</cmdname> daemon cannot refer to the 
statestore server using
+              the loopback address. If the statestore is hosted on a machine 
with an IP address
+              of 192.168.0.27, change:
+            </p>
+<codeblock>IMPALA_STATE_STORE_HOST=127.0.0.1</codeblock>
+            <p>
+              to:
+            </p>
+<codeblock>IMPALA_STATE_STORE_HOST=192.168.0.27</codeblock>
+          </li>
+
+          <li rev="1.2">
+            <p>
+              Catalog server address. Update the 
<codeph>IMPALA_CATALOG_SERVICE_HOST</codeph>
+              variable, including both the hostname and the port number in the 
value. Cloudera
+              recommends the catalog server be on the same host as the 
statestore. In that
+              recommended configuration, the <cmdname>impalad</cmdname> daemon 
cannot refer to
+              the catalog server using the loopback address. If the catalog 
service is hosted on
+              a machine with an IP address of 192.168.0.27, add the following 
line:
+            </p>
+<codeblock>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</codeblock>
+            <p>
+              The <filepath>/etc/default/impala</filepath> defaults file 
currently does not
+              define an <codeph>IMPALA_CATALOG_ARGS</codeph> environment 
variable, but if you
+              add one it will be recognized by the service startup/shutdown 
script. Add a
+              definition for this variable to 
<filepath>/etc/default/impala</filepath> and add
+              the option 
<codeph>-catalog_service_host=<varname>hostname</varname></codeph>. If
+              the port is different than the default 26000, also add the option
+              <codeph>-catalog_service_port=<varname>port</varname></codeph>.
+            </p>
+          </li>
+
+          <li id="mem_limit">
+            Memory limits. You can limit the amount of memory available to 
Impala. For example,
+            to allow Impala to use no more than 70% of system memory, change:
+<!-- Note: also needs to be updated for each release to reflect latest 
/etc/default/impala. -->
+<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} \
+    -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT}}</codeblock>
+            <p>
+              to:
+            </p>
+<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</codeblock>
+            <p>
+              You can specify the memory limit using absolute notation such as
+              <codeph>500m</codeph> or <codeph>2G</codeph>, or as a percentage 
of physical
+              memory such as <codeph>60%</codeph>.
+            </p>
+
+            <note>
+              Queries that exceed the specified memory limit are aborted. 
Percentage limits are
+              based on the physical memory of the machine and do not consider 
cgroups.
+            </note>
+          </li>
+
+          <li>
+            Core dump enablement. To enable core dumps, change:
+<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
+            <p>
+              to:
+            </p>
+<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</codeblock>
+            <note>
+              The location of core dump files may vary according to your 
operating system
+              configuration. Other security settings may prevent Impala from 
writing core dumps
+              even when this option is enabled.
+            </note>
+          </li>
+
+          <li>
+            Authorization using the open source Sentry plugin. Specify the
+            <codeph>-server_name</codeph> and 
<codeph>-authorization_policy_file</codeph>
+            options as part of the <codeph>IMPALA_SERVER_ARGS</codeph> and
+            <codeph>IMPALA_STATE_STORE_ARGS</codeph> settings to enable the 
core Impala support
+            for authentication. See <xref 
href="impala_authorization.xml#secure_startup"/> for
+            details.
+          </li>
+
+          <li>
+            Auditing for successful or blocked Impala queries, another aspect 
of security.
+            Specify the 
<codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
+            option and optionally the
+            
<codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
+            and <codeph>-abort_on_failed_audit_event</codeph> options as part 
of the
+            <codeph>IMPALA_SERVER_ARGS</codeph> settings, for each Impala 
node, to enable and
+            customize auditing. See <xref 
href="impala_auditing.xml#auditing"/> for details.
+          </li>
+
+          <li>
+            Password protection for the Impala web UI, which listens on port 
25000 by default.
+            This feature involves adding some or all of the
+            <codeph>--webserver_password_file</codeph>,
+            <codeph>--webserver_authentication_domain</codeph>, and
+            <codeph>--webserver_certificate_file</codeph> options to the
+            <codeph>IMPALA_SERVER_ARGS</codeph> and 
<codeph>IMPALA_STATE_STORE_ARGS</codeph>
+            settings. See <xref href="impala_security_webui.xml"/> for details.
+          </li>
+
+          <li id="default_query_options">
+            Another setting you might add to 
<codeph>IMPALA_SERVER_ARGS</codeph> is:
+<codeblock>-default_query_options='<varname>option</varname>=<varname>value</varname>,<varname>option</varname>=<varname>value</varname>,...'
+</codeblock>
+            These options control the behavior of queries performed by this
+            <cmdname>impalad</cmdname> instance. The option values you specify 
here override the
+            default values for <xref 
href="impala_query_options.xml#query_options">Impala query
+            options</xref>, as shown by the <codeph>SET</codeph> statement in
+            <cmdname>impala-shell</cmdname>.
+          </li>
+
+<!-- Removing this reference now that the options are de-emphasized / 
desupported in CDH 5.5 / Impala 2.3 and up.
+          <li rev="1.2">
+            Options for resource management, in conjunction with the YARN 
component. These options
+            include <codeph>-enable_rm</codeph> and 
<codeph>-cgroup_hierarchy_path</codeph>.
+            <ph rev="1.4.0">Additional options to help fine-tune the resource 
estimates are
+            <codeph>-ârm_always_use_defaults</codeph>,
+            <codeph>-ârm_default_memory=<varname>size</varname></codeph>, and
+            <codeph>-ârm_default_cpu_cores</codeph>.</ph> For details about 
these options, see
+            <xref href="impala_resource_management.xml#rm_options"/>. See
+            <xref href="impala_resource_management.xml#resource_management"/> 
for information about resource
+            management in general.
+          </li>
+-->
+
+          <li>
+            During troubleshooting, <keyword keyref="support_org"/> might 
direct you to change other values,
+            particularly for <codeph>IMPALA_SERVER_ARGS</codeph>, to work 
around issues or
+            gather debugging information.
+          </li>
+        </ul>
+
+<!-- Removing this reference now that the options are de-emphasized / 
desupported in CDH 5.5 / Impala 2.3 and up.
+        <p 
conref="impala_resource_management.xml#rm_options/resource_management_impalad_options"/>
+-->
+
+        <note>
+          <p>
+            These startup options for the <cmdname>impalad</cmdname> daemon 
are different from
+            the command-line options for the <cmdname>impala-shell</cmdname> 
command. For the
+            <cmdname>impala-shell</cmdname> options, see
+            <xref href="impala_shell_options.xml#shell_options"/>.
+          </p>
+        </note>
+
+      </conbody>
+
+    </concept>
+
+    <concept audience="Cloudera" id="config_options_statestored_details">
+
+      <title>Configuration Options for statestored Daemon</title>
+
+      <conbody>
+
+        <p></p>
+
+      </conbody>
+
+    </concept>
+
+    <concept audience="Cloudera" id="config_options_catalogd_details">
+
+      <title>Configuration Options for catalogd Daemon</title>
+
+      <conbody>
+
+        <p></p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="config_options_checking">
+
+    <title>Checking the Values of Impala Configuration Options</title>
+
+    <conbody>
+
+      <p>
+        You can check the current runtime value of all these settings through 
the Impala web
+        interface, available by default at
+        <codeph>http://<varname>impala_hostname</varname>:25000/varz</codeph> 
for the
+        <cmdname>impalad</cmdname> daemon,
+        <codeph>http://<varname>impala_hostname</varname>:25010/varz</codeph> 
for the
+        <cmdname>statestored</cmdname> daemon, or
+        <codeph>http://<varname>impala_hostname</varname>:25020/varz</codeph> 
for the
+        <cmdname>catalogd</cmdname> daemon. In the Cloudera Manager interface, 
you can see the
+        link to the appropriate <uicontrol><varname>service_name</varname> Web 
UI</uicontrol>
+        page when you look at the status page for a specific daemon on a 
specific host.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="config_options_impalad">
+
+    <title>Startup Options for impalad Daemon</title>
+
+    <conbody>
+
+      <p>
+        The <codeph>impalad</codeph> daemon implements the main Impala 
service, which performs
+        query processing and reads and writes the data files.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="config_options_statestored">
+
+    <title>Startup Options for statestored Daemon</title>
+
+    <conbody>
+
+      <p>
+        The <cmdname>statestored</cmdname> daemon implements the Impala 
statestore service,
+        which monitors the availability of Impala services across the cluster, 
and handles
+        situations such as nodes becoming unavailable or becoming available 
again.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="1.2" id="config_options_catalogd">
+
+    <title>Startup Options for catalogd Daemon</title>
+
+    <conbody>
+
+      <p>
+        The <cmdname>catalogd</cmdname> daemon implements the Impala catalog 
service, which
+        broadcasts metadata changes to all the Impala nodes when Impala 
creates a table, inserts
+        data, or performs other kinds of DDL and DML operations.
+      </p>
+
+      <p 
conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
+
+    </conbody>
+
+  </concept>
+
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_config_performance.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_config_performance.xml 
b/docs/topics/impala_config_performance.xml
new file mode 100644
index 0000000..0a44a33
--- /dev/null
+++ b/docs/topics/impala_config_performance.xml
@@ -0,0 +1,179 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="config_performance">
+
+  <title>Post-Installation Configuration for Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p id="p_24">
+      This section describes the mandatory and recommended configuration 
settings for Impala. If Impala is
+      installed using Cloudera Manager, some of these configurations are 
completed automatically; you must still
+      configure short-circuit reads manually. If you installed Impala without 
Cloudera Manager, or if you want to
+      customize your environment, consider making the changes described in 
this topic.
+    </p>
+
+    <p>
+<!-- Could conref this paragraph from ciiu_install.xml. -->
+      In some cases, depending on the level of Impala, CDH, and Cloudera 
Manager, you might need to add particular
+      component configuration details in one of the free-form fields on the 
Impala configuration pages within
+      Cloudera Manager. <ph 
conref="../shared/impala_common.xml#common/safety_valve"/>
+    </p>
+
+    <ul>
+      <li>
+        You must enable short-circuit reads, whether or not Impala was 
installed through Cloudera Manager. This
+        setting goes in the Impala configuration settings, not the Hadoop-wide 
settings.
+      </li>
+
+      <li>
+        If you installed Impala in an environment that is not managed by 
Cloudera Manager, you must enable block
+        location tracking, and you can optionally enable native checksumming 
for optimal performance.
+      </li>
+
+      <li>
+        If you deployed Impala using Cloudera Manager see
+        <xref href="impala_perf_testing.xml#performance_testing"/> to confirm 
proper configuration.
+      </li>
+    </ul>
+
+    <section id="section_fhq_wyv_ls">
+      <title>Mandatory: Short-Circuit Reads</title>
+      <p> Enabling short-circuit reads allows Impala to read local data 
directly
+        from the file system. This removes the need to communicate through the
+        DataNodes, improving performance. This setting also minimizes the 
number
+        of additional copies of data. Short-circuit reads requires
+          <codeph>libhadoop.so</codeph>
+        <!-- This link went stale. Not obvious how to keep it in sync with 
whatever Hadoop CDH is using behind the scenes. So hide the link for now. -->
+        <!--        (the <xref 
href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html"; 
scope="external" format="html">Hadoop Native Library</xref>) -->
+        (the Hadoop Native Library) to be accessible to both the server and the
+        client. <codeph>libhadoop.so</codeph> is not available if you have
+        installed from a tarball. You must install from an
+        <codeph>.rpm</codeph>, <codeph>.deb</codeph>, or parcel to use
+        short-circuit local reads. <note> If you use Cloudera Manager, you can
+          enable short-circuit reads through a checkbox in the user interface
+          and that setting takes effect for Impala as well. </note>
+      </p>
+      <p>
+        <b>To configure DataNodes for short-circuit reads:</b>
+      </p>
+      <ol id="ol_qlq_wyv_ls">
+        <li id="copy_config_files"> Copy the client
+            <codeph>core-site.xml</codeph> and <codeph>hdfs-site.xml</codeph>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <codeph>/etc/impala/conf</codeph>. </li>
+        <li>
+          <indexterm audience="Cloudera"
+            >dfs.client.read.shortcircuit</indexterm>
+          <indexterm audience="Cloudera">dfs.domain.socket.path</indexterm>
+          <indexterm audience="Cloudera"
+            >dfs.client.file-block-storage-locations.timeout.millis</indexterm>
+          On all Impala nodes, configure the following properties in <!-- 
Exact timing is unclear, since we say farther down to copy 
/etc/hadoop/conf/hdfs-site.xml to /etc/impala/conf.
+     Which wouldn't work if we already modified the Impala version of the file 
here. Not to mention that this
+     doesn't take the CM interface into account, where these /etc files might 
not exist in those locations. -->
+          <!--          <codeph>/etc/impala/conf/hdfs-site.xml</codeph> as 
shown: -->
+          Impala's copy of <codeph>hdfs-site.xml</codeph> as shown: 
<codeblock>&lt;property&gt;
+    &lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.domain.socket.path&lt;/name&gt;
+    &lt;value&gt;/var/run/hdfs-sockets/dn&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    
&lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
+    &lt;value&gt;10000&lt;/value&gt;
+&lt;/property&gt;</codeblock>
+          <!-- Former socket.path value:    
&lt;value&gt;/var/run/hadoop-hdfs/dn._PORT&lt;/value&gt; -->
+          <!--
+          <note>
+            The text <codeph>_PORT</codeph> appears just as shown; you do not 
need to
+            substitute a number.
+          </note>
+-->
+        </li>
+        <li>
+          <p> If <codeph>/var/run/hadoop-hdfs/</codeph> is group-writable, make
+            sure its group is <codeph>root</codeph>. </p>
+          <note> If you are also going to enable block location tracking, you
+            can skip copying configuration files and restarting DataNodes and 
go
+            straight to <xref 
href="#config_performance/block_location_tracking"
+             >Optional: Block Location Tracking</xref>.
+            Configuring short-circuit reads and block location tracking require
+            the same process of copying files and restarting services, so you
+            can complete that process once when you have completed all
+            configuration changes. Whether you copy files and restart services
+            now or during configuring block location tracking, short-circuit
+            reads are not enabled until you complete those final steps. </note>
+        </li>
+        <li id="restart_all_datanodes"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+    </section>
+
+    <section id="block_location_tracking">
+
+      <title>Mandatory: Block Location Tracking</title>
+
+      <p>
+        Enabling block location metadata allows Impala to know which disk data 
blocks are located on, allowing
+        better utilization of the underlying disks. Impala will not start 
unless this setting is enabled.
+      </p>
+
+      <p>
+        <b>To enable block location tracking:</b>
+      </p>
+
+      <ol>
+        <li>
+          For each DataNode, adding the following to theÂ 
<codeph>hdfs-site.xml</codeph> file:
+<codeblock>&lt;property&gt;
+  &lt;name&gt;dfs.datanode.hdfs-blocks-metadata.enabled&lt;/name&gt;
+  &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt; </codeblock>
+        </li>
+
+        <li conref="#config_performance/copy_config_files"/>
+
+        <li conref="#config_performance/restart_all_datanodes"/>
+      </ol>
+    </section>
+
+    <section id="native_checksumming">
+
+      <title>Optional: Native Checksumming</title>
+
+      <p>
+        Enabling native checksumming causes Impala to use an optimized native 
library for computing checksums, if
+        that library is available.
+      </p>
+
+      <p id="p_29">
+        <b>To enable native checksumming:</b>
+      </p>
+
+      <p>
+        If you installed CDH from packages, the native checksumming library is 
installed and setup correctly. In
+        such a case, no additional steps are required. Conversely, if you 
installed by other means, such as with
+        tarballs, native checksumming may not be available due to missing 
shared objects. Finding the message
+        "<codeph>Unable to load native-hadoop library for your platform... 
using builtin-java classes where
+        applicable</codeph>" in the Impala logs indicates native checksumming 
may be unavailable. To enable native
+        checksumming, you must build and install <codeph>libhadoop.so</codeph> 
(the
+        <!-- Another instance of stale link. -->
+        <!-- <xref 
href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html"; 
scope="external" format="html">Hadoop Native Library</xref>). -->
+        Hadoop Native Library).
+      </p>
+    </section>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_connecting.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_connecting.xml 
b/docs/topics/impala_connecting.xml
new file mode 100644
index 0000000..354e698
--- /dev/null
+++ b/docs/topics/impala_connecting.xml
@@ -0,0 +1,202 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="connecting">
+
+  <title>Connecting to impalad through impala-shell</title>
+  <titlealts audience="PDF"><navtitle>Connecting to 
impalad</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="Network"/>
+      <data name="Category" value="DataNode"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+<!--
+TK: This would be a good theme for a tutorial topic.
+Lots of nuances to illustrate through sample code.
+-->
+
+    <p>
+      Within an <cmdname>impala-shell</cmdname> session, you can only issue 
queries while connected to an instance
+      of the <cmdname>impalad</cmdname> daemon. You can specify the connection 
information:
+      <ul>
+        <li>
+          Through command-line options when you run the 
<cmdname>impala-shell</cmdname> command.
+        </li>
+        <li>
+          Through a configuration file that is read when you run the 
<cmdname>impala-shell</cmdname> command.
+        </li>
+        <li>
+          During an <cmdname>impala-shell</cmdname> session, by issuing a 
<codeph>CONNECT</codeph> command.
+        </li>
+      </ul>
+      See <xref href="impala_shell_options.xml"/> for the command-line and 
configuration file options you can use.
+    </p>
+
+    <p>
+      You can connect to any DataNode where an instance of 
<cmdname>impalad</cmdname> is running,
+      and that host coordinates the execution of all queries sent to it.
+    </p>
+
+    <p>
+      For simplicity during development, you might always connect to the same 
host, perhaps running <cmdname>impala-shell</cmdname> on
+      the same host as <cmdname>impalad</cmdname> and specifying the hostname 
as <codeph>localhost</codeph>.
+    </p>
+
+    <p>
+      In a production environment, you might enable load balancing, in which 
you connect to specific host/port combination
+      but queries are forwarded to arbitrary hosts. This technique spreads the 
overhead of acting as the coordinator
+      node among all the DataNodes in the cluster. See <xref 
href="impala_proxy.xml"/> for details.
+    </p>
+
+    <p>
+      <b>To connect the Impala shell during shell startup:</b>
+    </p>
+
+    <ol>
+      <li>
+        Locate the hostname of a DataNode within the cluster that is running 
an instance of the
+        <cmdname>impalad</cmdname> daemon. If that DataNode uses a non-default 
port (something
+        other than port 21000) for <cmdname>impala-shell</cmdname> 
connections, find out the
+        port number also.
+      </li>
+
+      <li>
+        Use the <codeph>-i</codeph> option to the
+        <cmdname>impala-shell</cmdname> interpreter to specify the connection 
information for
+        that instance of <cmdname>impalad</cmdname>:
+<codeblock>
+# When you are logged into the same machine running impalad.
+# The prompt will reflect the current hostname.
+$ impala-shell
+
+# When you are logged into the same machine running impalad.
+# The host will reflect the hostname 'localhost'.
+$ impala-shell -i localhost
+
+# When you are logged onto a different host, perhaps a client machine
+# outside the Hadoop cluster.
+$ impala-shell -i <varname>some.other.hostname</varname>
+
+# When you are logged onto a different host, and impalad is listening
+# on a non-default port. Perhaps a load balancer is forwarding requests
+# to a different host/port combination behind the scenes.
+$ impala-shell -i 
<varname>some.other.hostname</varname>:<varname>port_number</varname>
+</codeblock>
+      </li>
+    </ol>
+
+    <p>
+      <b>To connect the Impala shell after shell startup:</b>
+    </p>
+
+    <ol>
+      <li>
+        Start the Impala shell with no connection:
+<codeblock>$ impala-shell</codeblock>
+        <p>
+          You should see a prompt like the following:
+        </p>
+<codeblock>Welcome to the Impala shell. Press TAB twice to see a list of 
available commands.
+
+Copyright (c) <varname>year</varname> Cloudera, Inc. All rights reserved.
+
+<ph conref="../shared/ImpalaVariables.xml#impala_vars/ShellBanner"/>
+[Not connected] &gt; </codeblock>
+      </li>
+
+      <li>
+        Locate the hostname of a DataNode within the cluster that is running 
an instance of the
+        <cmdname>impalad</cmdname> daemon. If that DataNode uses a non-default 
port (something
+        other than port 21000) for <cmdname>impala-shell</cmdname> 
connections, find out the
+        port number also.
+      </li>
+
+      <li>
+        Use the <codeph>connect</codeph> command to connect to an Impala 
instance. Enter a command of the form:
+<codeblock>[Not connected] &gt; connect <varname>impalad-host</varname>
+[<varname>impalad-host</varname>:21000] &gt;</codeblock>
+        <note>
+          Replace <varname>impalad-host</varname> with the hostname you have 
configured for any DataNode running
+          Impala in your environment. The changed prompt indicates a 
successful connection.
+        </note>
+      </li>
+    </ol>
+
+    <p>
+      <b>To start <cmdname>impala-shell</cmdname> in a specific database:</b>
+    </p>
+
+    <p>
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <cmdname>impalad</cmdname> daemon.
+    </p>
+
+    <ol>
+      <li>
+        Find the name of the database containing the relevant tables, views, 
and so
+        on that you want to operate on.
+      </li>
+
+      <li>
+        Use the <codeph>-d</codeph> option to the
+        <cmdname>impala-shell</cmdname> interpreter to connect and immediately
+        switch to the specified database, without the need for a 
<codeph>USE</codeph>
+        statement or fully qualified names:
+<codeblock>
+# Subsequent queries with unqualified names operate on
+# tables, views, and so on inside the database named 'staging'.
+$ impala-shell -i localhost -d staging
+
+# It is common during development, ETL, benchmarking, and so on
+# to have different databases containing the same table names
+# but with different contents or layouts.
+$ impala-shell -i localhost -d parquet_snappy_compression
+$ impala-shell -i localhost -d parquet_gzip_compression
+</codeblock>
+      </li>
+    </ol>
+
+    <p>
+      <b>To run one or several statements in non-interactive mode:</b>
+    </p>
+
+    <p>
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <cmdname>impalad</cmdname> daemon.
+    </p>
+
+    <ol>
+      <li>
+        Construct a statement, or a file containing a sequence of statements,
+        that you want to run in an automated way, without typing or copying
+        and pasting each time.
+      </li>
+
+      <li>
+        Invoke <cmdname>impala-shell</cmdname> with the <codeph>-q</codeph> 
option to run a single statement, or
+        the <codeph>-f</codeph> option to run a sequence of statements from a 
file.
+        The <cmdname>impala-shell</cmdname> command returns immediately, 
without going into
+        the interactive interpreter.
+<codeblock>
+# A utility command that you might run while developing shell scripts
+# to manipulate HDFS files.
+$ impala-shell -i localhost -d database_of_interest -q 'show tables'
+
+# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
+# can go into a file to make the setup process repeatable.
+$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
+</codeblock>
+      </li>
+    </ol>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_conversion_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_conversion_functions.xml 
b/docs/topics/impala_conversion_functions.xml
new file mode 100644
index 0000000..36f5ec4
--- /dev/null
+++ b/docs/topics/impala_conversion_functions.xml
@@ -0,0 +1,758 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="conversion_functions">
+
+  <title>Impala Type Conversion Functions</title>
+  <titlealts audience="PDF"><navtitle>Type Conversion 
Functions</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Conversion functions are usually used in combination with other 
functions, to explicitly pass the expected
+      data types. Impala has strict rules regarding data types for function 
parameters. For example, Impala does
+      not automatically convert a <codeph>DOUBLE</codeph> value to 
<codeph>FLOAT</codeph>, a
+      <codeph>BIGINT</codeph> value to <codeph>INT</codeph>, or other 
conversion where precision could be lost or
+      overflow could occur. Also, for reporting or dealing with loosely 
defined schemas in big data contexts,
+      you might frequently need to convert values to or from the 
<codeph>STRING</codeph> type.
+    </p>
+
+    <note>
+      Although in CDH 5.5.0, the <codeph>SHOW FUNCTIONS</codeph> output for
+      database <codeph>_IMPALA_BUILTINS</codeph> contains some function 
signatures
+      matching the pattern <codeph>castto*</codeph>, these functions are not 
intended
+      for public use and are expected to be hidden in future.
+    </note>
+
+    <p>
+      <b>Function reference:</b>
+    </p>
+
+    <p>
+      Impala supports the following type conversion functions:
+    </p>
+
+<dl>
+
+<dlentry id="cast">
+<dt>
+<codeph>cast(<varname>expr</varname> AS <varname>type</varname>)</codeph>
+</dt>
+
+<dd>
+<indexterm audience="Cloudera">cast() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to any other type.
+If the expression value is of a type that cannot be converted to the target 
type, the result is <codeph>NULL</codeph>.
+<p><b>Usage notes:</b>
+Use <codeph>CAST</codeph> when passing a column value or literal to a function 
that
+expects a parameter with a different type.
+Frequently used in SQL operations such as <codeph>CREATE TABLE AS 
SELECT</codeph>
+and <codeph>INSERT ... VALUES</codeph> to ensure that values from various 
sources
+are of the appropriate type for the destination columns.
+Where practical, do a one-time <codeph>CAST()</codeph> operation during the 
ingestion process
+to make each column into the appropriate type, rather than using many 
<codeph>CAST()</codeph>
+operations in each query; doing type conversions for each row during each 
query can be expensive
+for tables with millions or billions of rows.
+</p>
+    <p conref="../shared/impala_common.xml#common/timezone_conversion_caveat"/>
+
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<codeblock>select concat('Here are the first ',10,' results.'); -- Fails
+select concat('Here are the first ',cast(10 as string),' results.'); -- 
Succeeds
+</codeblock>
+<p>
+The following example starts with a text table where every column has a type 
of <codeph>STRING</codeph>,
+which might be how you ingest data of unknown schema until you can verify the 
cleanliness of the underly values.
+Then it uses <codeph>CAST()</codeph> to create a new Parquet table with the 
same data, but using specific
+numeric data types for the columns with numeric data. Using numeric types of 
appropriate sizes can result in
+substantial space savings on disk and in memory, and performance improvements 
in queries,
+over using strings or larger-than-necessary numeric types.
+</p>
+<codeblock>create table t1 (name string, x string, y string, z string);
+
+create table t2 stored as parquet
+as select
+  name,
+  cast(x as bigint) x,
+  cast(y as timestamp) y,
+  cast(z as smallint) z
+from t1;
+
+describe t2;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| x    | bigint   |         |
+| y    | smallint |         |
+| z    | tinyint  |         |
++------+----------+---------+
+</codeblock>
+<p conref="../shared/impala_common.xml#common/related_info"/>
+<p>
+<!-- TK: Can you cast to or from MAP, ARRAY, STRUCT? -->
+  For details of casts from each kind of data type, see the description of
+  the appropriate type:
+  <xref href="impala_tinyint.xml#tinyint"/>,
+  <xref href="impala_smallint.xml#smallint"/>,
+  <xref href="impala_int.xml#int"/>,
+  <xref href="impala_bigint.xml#bigint"/>,
+  <xref href="impala_float.xml#float"/>,
+  <xref href="impala_double.xml#double"/>,
+  <xref href="impala_decimal.xml#decimal"/>,
+  <xref href="impala_string.xml#string"/>,
+  <xref href="impala_char.xml#char"/>,
+  <xref href="impala_varchar.xml#varchar"/>,
+  <xref href="impala_timestamp.xml#timestamp"/>,
+  <xref href="impala_boolean.xml#boolean"/>
+</p>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttobigint" audience="Cloudera">
+<dt>
+<codeph>casttobigint(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttobigint() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>BIGINT</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>bigint</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>create table small_types (x tinyint, y smallint, z int);
+
+create table big_types as
+  select casttobigint(x) as x, casttobigint(y) as y, casttobigint(z) as z
+    from small_types;
+
+describe big_types;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | bigint |         |
+| y    | bigint |         |
+| z    | bigint |         |
++------+--------+---------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttoboolean" audience="Cloudera">
+<dt>
+<codeph>casttoboolean(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttoboolean() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>BOOLEAN</codeph>.
+Numeric values of 0 evaluate to <codeph>false</codeph>, and non-zero values 
evaluate to <codeph>true</codeph>.
+If the expression value is of a type that cannot be converted to the target 
type, the result is <codeph>NULL</codeph>.
+In particular, <codeph>STRING</codeph> values (even <codeph>'1'</codeph>, 
<codeph>'0'</codeph>, <codeph>'true'</codeph>
+or <codeph>'false'</codeph>) always return <codeph>NULL</codeph> when 
converted to <codeph>BOOLEAN</codeph>.
+<p><b>Return type:</b> <codeph>boolean</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttoboolean(0);
++------------------+
+| casttoboolean(0) |
++------------------+
+| false            |
++------------------+
+
+select casttoboolean(1);
++------------------+
+| casttoboolean(1) |
++------------------+
+| true             |
++------------------+
+
+select casttoboolean(99);
++-------------------+
+| casttoboolean(99) |
++-------------------+
+| true              |
++-------------------+
+
+select casttoboolean(0.0);
++--------------------+
+| casttoboolean(0.0) |
++--------------------+
+| false              |
++--------------------+
+
+select casttoboolean(0.5);
++--------------------+
+| casttoboolean(0.5) |
++--------------------+
+| true               |
++--------------------+
+
+select casttoboolean('');
++-------------------+
+| casttoboolean('') |
++-------------------+
+| NULL              |
++-------------------+
+
+select casttoboolean('yes');
++----------------------+
+| casttoboolean('yes') |
++----------------------+
+| NULL                 |
++----------------------+
+
+select casttoboolean('0');
++--------------------+
+| casttoboolean('0') |
++--------------------+
+| NULL               |
++--------------------+
+
+select casttoboolean('true');
++-----------------------+
+| casttoboolean('true') |
++-----------------------+
+| NULL                  |
++-----------------------+
+
+select casttoboolean('false');
++------------------------+
+| casttoboolean('false') |
++------------------------+
+| NULL                   |
++------------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttochar" audience="Cloudera">
+<dt>
+<codeph>casttochar(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttochar() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to <codeph>CHAR</codeph>. 
If the expression value is of a type that cannot be converted to the target 
type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>char</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>create table char_types as select casttochar('hello world') as c1, 
casttochar('xyz') as c2, casttochar('x') as c3;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+
+describe char_types;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| c1   | string |         |
+| c2   | string |         |
+| c3   | string |         |
++------+--------+---------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttodecimal" audience="Cloudera">
+<dt>
+<codeph>casttodecimal(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttodecimal() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>DECIMAL</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>decimal</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttodecimal(5.4);
++--------------------+
+| casttodecimal(5.4) |
++--------------------+
+| 5.4                |
++--------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttodouble" audience="Cloudera">
+<dt>
+<codeph>casttodouble(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttodouble() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>DOUBLE</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>double</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttodouble(5);
++-----------------+
+| casttodouble(5) |
++-----------------+
+| 5               |
++-----------------+
+
+select casttodouble('3.141');
++-----------------------+
+| casttodouble('3.141') |
++-----------------------+
+| 3.141                 |
++-----------------------+
+
+select casttodouble(1e6);
++--------------------+
+| casttodouble(1e+6) |
++--------------------+
+| 1000000            |
++--------------------+
+
+select casttodouble(true);
++--------------------+
+| casttodouble(true) |
++--------------------+
+| 1                  |
++--------------------+
+
+select casttodouble(now());
++---------------------+
+| casttodouble(now()) |
++---------------------+
+| 1447622306.031178   |
++---------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttofloat" audience="Cloudera">
+<dt>
+<codeph>casttofloat(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttofloat() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to <codeph>FLOAT</codeph>. 
If the expression value is of a type that cannot be converted to the target 
type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>float</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttofloat(5);
++----------------+
+| casttofloat(5) |
++----------------+
+| 5              |
++----------------+
+
+select casttofloat('3.141');
++----------------------+
+| casttofloat('3.141') |
++----------------------+
+| 3.141000032424927    |
++----------------------+
+
+select casttofloat(1e6);
++-------------------+
+| casttofloat(1e+6) |
++-------------------+
+| 1000000           |
++-------------------+
+
+select casttofloat(true);
++-------------------+
+| casttofloat(true) |
++-------------------+
+| 1                 |
++-------------------+
+
+select casttofloat(now());
++--------------------+
+| casttofloat(now()) |
++--------------------+
+| 1447622400         |
++--------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttoint" audience="Cloudera">
+<dt>
+<codeph>casttoint(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttoint() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to <codeph>INT</codeph>. 
If the expression value is of a type that cannot be converted to the target 
type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>int</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttoint(5.4);
++----------------+
+| casttoint(5.4) |
++----------------+
+| 5              |
++----------------+
+
+select casttoint(true);
++-----------------+
+| casttoint(true) |
++-----------------+
+| 1               |
++-----------------+
+
+select casttoint(now());
++------------------+
+| casttoint(now()) |
++------------------+
+| 1447622487       |
++------------------+
+
+select casttoint('3.141');
++--------------------+
+| casttoint('3.141') |
++--------------------+
+| NULL               |
++--------------------+
+
+select casttoint('3');
++----------------+
+| casttoint('3') |
++----------------+
+| 3              |
++----------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttosmallint" audience="Cloudera">
+<dt>
+<codeph>casttosmallint(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttosmallint() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>SMALLINT</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>smallint</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>create table big_types (x bigint, y int, z smallint);
+
+create table small_types as
+  select casttosmallint(x) as x, casttosmallint(y) as y, casttosmallint(z) as z
+    from big_types;
+
+describe small_types;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| x    | smallint |         |
+| y    | smallint |         |
+| z    | smallint |         |
++------+----------+---------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttostring" audience="Cloudera">
+<dt>
+<codeph>casttostring(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttostring() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>STRING</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>string</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>create table numeric_types (x int, y bigint, z tinyint);
+
+create table string_types as
+  select casttostring(x) as x, casttostring(y) as y, casttostring(z) as z
+    from numeric_types;
+
+describe string_types;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | string |         |
+| y    | string |         |
+| z    | string |         |
++------+--------+---------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttotimestamp" audience="Cloudera">
+<dt>
+<codeph>casttotimestamp(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttotimestamp() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>TIMESTAMP</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>timestamp</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttotimestamp(1000);
++-----------------------+
+| casttotimestamp(1000) |
++-----------------------+
+| 1970-01-01 00:16:40   |
++-----------------------+
+
+select casttotimestamp(1000.0);
++-------------------------+
+| casttotimestamp(1000.0) |
++-------------------------+
+| 1970-01-01 00:16:40     |
++-------------------------+
+
+select casttotimestamp('1000');
++-------------------------+
+| casttotimestamp('1000') |
++-------------------------+
+| NULL                    |
++-------------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttotinyint" audience="Cloudera">
+<dt>
+<codeph>casttotinyint(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttotinyint() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>TINYINT</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>tinyint</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>create table big_types (x bigint, y int, z smallint);
+
+create table tiny_types as
+  select casttotinyint(x) as x, casttotinyint(y) as y, casttotinyint(z) as z
+    from big_types;
+
+describe tiny_types;
++------+---------+---------+
+| name | type    | comment |
++------+---------+---------+
+| x    | tinyint |         |
+| y    | tinyint |         |
+| z    | tinyint |         |
++------+---------+---------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="casttovarchar" audience="Cloudera">
+<dt>
+<codeph>casttovarchar(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">casttovarchar() function</indexterm>
+<b>Purpose:</b> Converts the value of an expression to 
<codeph>VARCHAR</codeph>. If the expression value is of a type that cannot be 
converted to the target type, the result is <codeph>NULL</codeph>.
+<p><b>Return type:</b> <codeph>varchar</codeph></p>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_usage"/>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p conref="../shared/impala_common.xml#common/cast_convenience_fn_example"/>
+<codeblock>select casttovarchar('abcd');
++-----------------------+
+| casttovarchar('abcd') |
++-----------------------+
+| abcd                  |
++-----------------------+
+
+select casttovarchar(999);
++--------------------+
+| casttovarchar(999) |
++--------------------+
+| 999                |
++--------------------+
+
+select casttovarchar(999.5);
++----------------------+
+| casttovarchar(999.5) |
++----------------------+
+| 999.5                |
++----------------------+
+
+select casttovarchar(now());
++-------------------------------+
+| casttovarchar(now())          |
++-------------------------------+
+| 2015-11-15 21:26:13.528073000 |
++-------------------------------+
+
+select casttovarchar(true);
++---------------------+
+| casttovarchar(true) |
++---------------------+
+| 1                   |
++---------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+<dlentry rev="2.3.0" id="typeof">
+<dt>
+<codeph>typeof(type value)</codeph>
+</dt>
+<dd>
+<indexterm audience="Cloudera">typeof() function</indexterm>
+<b>Purpose:</b> Returns the name of the data type corresponding to an 
expression. For types with
+extra attributes, such as length for <codeph>CHAR</codeph> and 
<codeph>VARCHAR</codeph>,
+or precision and scale for <codeph>DECIMAL</codeph>, includes the full 
specification of the type.
+<!-- To do: How about for columns of complex types? Or fields within complex 
types? -->
+<p><b>Return type:</b> <codeph>string</codeph></p>
+<p><b>Usage notes:</b> Typically used in interactive exploration of a schema, 
or in application code that programmatically generates schema definitions such 
as <codeph>CREATE TABLE</codeph> statements.
+For example, previously, to understand the type of an expression such as
+<codeph>col1 / col2</codeph> or <codeph>concat(col1, col2, col3)</codeph>,
+you might have created a dummy table with a single row, using syntax such as 
<codeph>CREATE TABLE foo AS SELECT 5 / 3.0</codeph>,
+and then doing a <codeph>DESCRIBE</codeph> to see the type of the row.
+Or you might have done a <codeph>CREATE TABLE AS SELECT</codeph> operation to 
create a table and
+copy data into it, only learning the types of the columns by doing a 
<codeph>DESCRIBE</codeph> afterward.
+This technique is especially useful for arithmetic expressions involving 
<codeph>DECIMAL</codeph> types,
+because the precision and scale of the result is typically different than that 
of the operands.
+</p>
+<p conref="../shared/impala_common.xml#common/added_in_230"/>
+<p conref="../shared/impala_common.xml#common/example_blurb"/>
+<p>
+These examples show how to check the type of a simple literal or function 
value.
+Notice how adding even tiny integers together changes the data type of the 
result to
+avoid overflow, and how the results of arithmetic operations on 
<codeph>DECIMAL</codeph> values
+have specific precision and scale attributes.
+</p>
+<codeblock>select typeof(2)
++-----------+
+| typeof(2) |
++-----------+
+| TINYINT   |
++-----------+
+
+select typeof(2+2)
++---------------+
+| typeof(2 + 2) |
++---------------+
+| SMALLINT      |
++---------------+
+
+select typeof('xyz')
++---------------+
+| typeof('xyz') |
++---------------+
+| STRING        |
++---------------+
+
+select typeof(now())
++---------------+
+| typeof(now()) |
++---------------+
+| TIMESTAMP     |
++---------------+
+
+select typeof(5.3 / 2.1)
++-------------------+
+| typeof(5.3 / 2.1) |
++-------------------+
+| DECIMAL(6,4)      |
++-------------------+
+
+select typeof(5.30001 / 2342.1);
++--------------------------+
+| typeof(5.30001 / 2342.1) |
++--------------------------+
+| DECIMAL(13,11)           |
++--------------------------+
+
+select typeof(typeof(2+2))
++-----------------------+
+| typeof(typeof(2 + 2)) |
++-----------------------+
+| STRING                |
++-----------------------+
+</codeblock>
+
+<p>
+This example shows how even if you do not have a record of the type of a 
column,
+for example because the type was changed by <codeph>ALTER TABLE</codeph> after 
the
+original <codeph>CREATE TABLE</codeph>, you can still find out the type in a
+more compact form than examining the full <codeph>DESCRIBE</codeph> output.
+Remember to use <codeph>LIMIT 1</codeph> in such cases, to avoid an identical
+result value for every row in the table.
+</p>
+<codeblock>create table typeof_example (a int, b tinyint, c smallint, d 
bigint);
+
+/* Empty result set if there is no data in the table. */
+select typeof(a) from typeof_example;
+
+/* OK, now we have some data but the type of column A is being changed. */
+insert into typeof_example values (1, 2, 3, 4);
+alter table typeof_example change a a bigint;
+
+/* We can always find out the current type of that column without doing a full 
DESCRIBE. */
+select typeof(a) from typeof_example limit 1;
++-----------+
+| typeof(a) |
++-----------+
+| BIGINT    |
++-----------+
+</codeblock>
+<p>
+This example shows how you might programmatically generate a <codeph>CREATE 
TABLE</codeph> statement
+with the appropriate column definitions to hold the result values of arbitrary 
expressions.
+The <codeph>typeof()</codeph> function lets you construct a detailed 
<codeph>CREATE TABLE</codeph> statement
+without actually creating the table, as opposed to <codeph>CREATE TABLE AS 
SELECT</codeph> operations
+where you create the destination table but only learn the column data types 
afterward through <codeph>DESCRIBE</codeph>.
+</p>
+<codeblock>describe typeof_example;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| a    | bigint   |         |
+| b    | tinyint  |         |
+| c    | smallint |         |
+| d    | bigint   |         |
++------+----------+---------+
+
+/* An ETL or business intelligence tool might create variations on a table 
with different file formats,
+   different sets of columns, and so on. TYPEOF() lets an application 
introspect the types of the original columns. */
+select concat('create table derived_table (a ', typeof(a), ', b ', typeof(b), 
', c ',
+    typeof(c), ', d ', typeof(d), ') stored as parquet;')
+  as 'create table statement'
+from typeof_example limit 1;
++-------------------------------------------------------------------------------------------+
+| create table statement                                                       
             |
++-------------------------------------------------------------------------------------------+
+| create table derived_table (a BIGINT, b TINYINT, c SMALLINT, d BIGINT) 
stored as parquet; |
++-------------------------------------------------------------------------------------------+
+</codeblock>
+</dd>
+</dlentry>
+
+</dl>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_count.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_count.xml b/docs/topics/impala_count.xml
new file mode 100644
index 0000000..4f9a697
--- /dev/null
+++ b/docs/topics/impala_count.xml
@@ -0,0 +1,236 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="count">
+
+  <title>COUNT Function</title>
+  <titlealts audience="PDF"><navtitle>COUNT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="Analytic Functions"/>
+      <data name="Category" value="Aggregate Functions"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">count() function</indexterm>
+      An aggregate function that returns the number of rows, or the number of 
non-<codeph>NULL</codeph> rows.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>COUNT([DISTINCT | ALL] <varname>expression</varname>) [OVER 
(<varname>analytic_clause</varname>)]</codeblock>
+
+    <p>
+      Depending on the argument, <codeph>COUNT()</codeph> considers rows that 
meet certain conditions:
+    </p>
+
+    <ul>
+      <li>
+        The notation <codeph>COUNT(*)</codeph> includes <codeph>NULL</codeph> 
values in the total.
+      </li>
+
+      <li>
+        The notation <codeph>COUNT(<varname>column_name</varname>)</codeph> 
only considers rows where the column
+        contains a non-<codeph>NULL</codeph> value.
+      </li>
+
+      <li>
+        You can also combine <codeph>COUNT</codeph> with the 
<codeph>DISTINCT</codeph> operator to eliminate
+        duplicates before counting, and to count the combinations of values 
across multiple columns.
+      </li>
+    </ul>
+
+    <p>
+      When the query contains a <codeph>GROUP BY</codeph> clause, returns one 
value for each combination of
+      grouping values.
+    </p>
+
+    <p>
+      <b>Return type:</b> <codeph>BIGINT</codeph>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/partition_key_optimization"/>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
+
+    <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>-- How many rows total are in the table, regardless of NULL values?
+select count(*) from t1;
+-- How many rows are in the table with non-NULL values for a column?
+select count(c1) from t1;
+-- Count the rows that meet certain conditions.
+-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col).
+select count(*) from t1 where x &gt; 10;
+select count(c1) from t1 where x &gt; 10;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Combine COUNT and DISTINCT to find the number of unique values.
+-- Must use column names rather than * with COUNT(DISTINCT ...) syntax.
+-- Rows with NULL values are not counted.
+select count(distinct c1) from t1;
+-- Rows with a NULL value in _either_ column are not counted.
+select count(distinct c1, c2) from t1;
+-- Return more than one result.
+select month, year, count(distinct visitor_id) from web_stats group by month, 
year;
+</codeblock>
+
+    <p rev="2.0.0">
+      The following examples show how to use <codeph>COUNT()</codeph> in an 
analytic context. They use a table
+      containing integers from 1 to 10. Notice how the 
<codeph>COUNT()</codeph> is reported for each input value, as
+      opposed to the <codeph>GROUP BY</codeph> clause which condenses the 
result set.
+<codeblock>select x, property, count(x) over (partition by property) as count 
from int_t where property in ('odd','even');
++----+----------+-------+
+| x  | property | count |
++----+----------+-------+
+| 2  | even     | 5     |
+| 4  | even     | 5     |
+| 6  | even     | 5     |
+| 8  | even     | 5     |
+| 10 | even     | 5     |
+| 1  | odd      | 5     |
+| 3  | odd      | 5     |
+| 5  | odd      | 5     |
+| 7  | odd      | 5     |
+| 9  | odd      | 5     |
++----+----------+-------+
+</codeblock>
+
+Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results 
that are cumulative or apply to a moving
+set of rows (the <q>window</q>). The following examples use 
<codeph>COUNT()</codeph> in an analytic context
+(that is, with an <codeph>OVER()</codeph> clause) to produce a running count 
of all the even values,
+then a running count of all the odd values. The basic <codeph>ORDER BY 
x</codeph> clause implicitly
+activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW</codeph>,
+therefore all of these examples produce the same results:
+<codeblock>select x, property,
+  count(x) over (partition by property <b>order by x</b>) as 'cumulative count'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between unbounded preceding and current row</b>
+  ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between unbounded preceding and current row</b>
+  ) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+</codeblock>
+
+The following examples show how to construct a moving window, with a running 
count taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even 
values or all the odd values).
+Therefore, the count is consistently 3 for rows in the middle of the window, 
and 2 for
+rows near the ends of the window, where there is no preceding or no following 
row in the partition.
+Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this 
type of
+moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but 
not the <codeph>RANGE BETWEEN</codeph>
+clause:
+<codeblock>select x, property,
+  count(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>rows between 1 preceding and 1 following</b>
+  ) as 'moving total'
+  from int_t where property in ('odd','even');
++----+----------+--------------+
+| x  | property | moving total |
++----+----------+--------------+
+| 2  | even     | 2            |
+| 4  | even     | 3            |
+| 6  | even     | 3            |
+| 8  | even     | 3            |
+| 10 | even     | 2            |
+| 1  | odd      | 2            |
+| 3  | odd      | 3            |
+| 5  | odd      | 3            |
+| 7  | odd      | 3            |
+| 9  | odd      | 2            |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <b>order by x</b>
+    <b>range between 1 preceding and 1 following</b>
+  ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and 
upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</codeblock>
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_analytic_functions.xml#analytic_functions"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_data_source.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_data_source.xml 
b/docs/topics/impala_create_data_source.xml
new file mode 100644
index 0000000..a93d6bc
--- /dev/null
+++ b/docs/topics/impala_create_data_source.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept audience="Cloudera" rev="1.4.0" id="create_data_source">
+
+  <title>CREATE DATA SOURCE Statement</title>
+  <titlealts audience="PDF"><navtitle>CREATE DATA SOURCE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE DATA SOURCE statement</indexterm>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_create_database.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_database.xml 
b/docs/topics/impala_create_database.xml
new file mode 100644
index 0000000..de556bc
--- /dev/null
+++ b/docs/topics/impala_create_database.xml
@@ -0,0 +1,137 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="create_database">
+
+  <title>CREATE DATABASE Statement</title>
+  <titlealts audience="PDF"><navtitle>CREATE DATABASE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="S3"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">CREATE DATABASE statement</indexterm>
+      Creates a new database.
+    </p>
+
+    <p>
+      In Impala, a database is both:
+    </p>
+
+    <ul>
+      <li>
+        A logical construct for grouping together related tables, views, and 
functions within their own namespace.
+        You might use a separate database for each application, set of related 
tables, or round of experimentation.
+      </li>
+
+      <li>
+        A physical construct represented by a directory tree in HDFS. Tables 
(internal tables), partitions, and
+        data files are all located under this directory. You can perform 
HDFS-level operations such as backing it up and measuring space usage,
+        or remove it with a <codeph>DROP DATABASE</codeph> statement.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] 
<varname>database_name</varname>[COMMENT '<varname>database_comment</varname>']
+  [LOCATION <varname>hdfs_path</varname>];</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      A database is physically represented as a directory in HDFS, with a 
filename extension <codeph>.db</codeph>,
+      under the main Impala data directory. If the associated HDFS directory 
does not exist, it is created for you.
+      All databases and their associated directories are top-level objects, 
with no physical or logical nesting.
+    </p>
+
+    <p>
+      After creating a database, to make it the current database within an 
<cmdname>impala-shell</cmdname> session,
+      use the <codeph>USE</codeph> statement. You can refer to tables in the 
current database without prepending
+      any qualifier to their names.
+    </p>
+
+    <p>
+      When you first connect to Impala through 
<cmdname>impala-shell</cmdname>, the database you start in (before
+      issuing any <codeph>CREATE DATABASE</codeph> or <codeph>USE</codeph> 
statements) is named
+      <codeph>default</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/builtins_db"/>
+
+    <p>
+      After creating a database, your <cmdname>impala-shell</cmdname> session 
or another
+      <cmdname>impala-shell</cmdname> connected to the same node can 
immediately access that database. To access
+      the database through the Impala daemon on a different node, issue the 
<codeph>INVALIDATE METADATA</codeph>
+      statement first while connected to that other node.
+    </p>
+
+    <p>
+      Setting the <codeph>LOCATION</codeph> attribute for a new database is a 
way to work with sets of files in an
+      HDFS directory structure outside the default Impala data directory, as 
opposed to setting the
+      <codeph>LOCATION</codeph> attribute for each individual table.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+
+    <p>
+      When you create a database in Impala, the database can also be used by 
Hive.
+      When you create a database in Hive, issue an <codeph>INVALIDATE 
METADATA</codeph>
+      statement in Impala to make Impala permanently aware of the new database.
+    </p>
+
+    <p>
+      The <codeph>SHOW DATABASES</codeph> statement lists all databases, or 
the databases whose name
+      matches a wildcard pattern. <ph rev="2.5.0">In <keyword 
keyref="impala25_full"/> and higher, the
+      <codeph>SHOW DATABASES</codeph> output includes a second column that 
displays the associated
+      comment, if any, for each database.</ph>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+
+    <p rev="2.6.0 CDH-39913 IMPALA-1878">
+      To specify that any tables created within a database reside on the 
Amazon S3 system,
+      you can include an <codeph>s3a://</codeph> prefix on the 
<codeph>LOCATION</codeph>
+      attribute. In <keyword keyref="impala26_full"/> and higher, Impala 
automatically creates any
+      required folders as the databases, tables, and partitions are created, 
and removes
+      them when they are dropped.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/s3_ddl"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have write
+      permission for the parent HDFS directory under which the database
+      is located.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <codeblock 
conref="../shared/impala_common.xml#common/create_drop_db_example"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_databases.xml#databases"/>, <xref 
href="impala_drop_database.xml#drop_database"/>,
+      <xref href="impala_use.xml#use"/>, <xref 
href="impala_show.xml#show_databases"/>,
+      <xref href="impala_tables.xml#tables"/>
+    </p>
+  </conbody>
+</concept>

[38/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to