[17/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:39 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_noncm_installation.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_noncm_installation.xml 
b/docs/topics/impala_noncm_installation.xml
new file mode 100644
index 0000000..d9c1f1a
--- /dev/null
+++ b/docs/topics/impala_noncm_installation.xml
@@ -0,0 +1,171 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="noncm_installation">
+
+  <title>Installing Impala without Cloudera Manager</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Installing"/>
+      <data name="Category" value="Administrators"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Before installing Impala manually, make sure all applicable nodes have 
the appropriate hardware
+      configuration, levels of operating system and CDH, and any other 
software prerequisites. See
+      <xref href="impala_prereqs.xml#prereqs"/> for details.
+    </p>
+
+    <p>
+      You can install Impala across many hosts or on one host:
+    </p>
+
+    <ul>
+      <li>
+        Installing Impala across multiple machines creates a distributed 
configuration. For best performance,
+        install Impala on <b>all</b> DataNodes.
+      </li>
+
+      <li>
+        Installing Impala on a single machine produces a pseudo-distributed 
cluster.
+      </li>
+    </ul>
+
+    <p>
+      <b>To install Impala on a host:</b>
+    </p>
+
+    <ol>
+      <li>
+        Install CDH as described in the Installation section of the
+<!-- Original URL: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/CDH5-Installation-Guide.html
 -->
+        <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/installation.html";
 scope="external" format="html">CDH
+        5 Installation Guide</xref>.
+      </li>
+
+      <li>
+        <p>
+          Install the Hive metastore somewhere in your cluster, as described 
in the Hive Installation topic in the
+<!-- Original URL: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh_ig_hive_installation.html
 -->
+          <xref 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_installation.html";
 scope="external" format="html">CDH
+          5 Installation Guide</xref>. As part of this process, you configure 
the Hive metastore to use an external
+          database as a metastore. Impala uses this same database for its own 
table metadata. You can choose either
+          a MySQL or PostgreSQL database as the metastore. The process for 
configuring each type of database is
+          described in the CDH Installation Guide).
+        </p>
+        <p>
+          <ph rev="upstream">Cloudera</ph> recommends setting up a Hive 
metastore service rather than connecting directly to the metastore
+          database; this configuration is required when running Impala under 
CDH 4.1. Make sure the
+          <filepath>/etc/impala/conf/hive-site.xml</filepath> file contains 
the following setting, substituting the
+          appropriate hostname for <varname>metastore_server_host</varname>:
+        </p>
+<codeblock>&lt;property&gt;
+&lt;name&gt;hive.metastore.uris&lt;/name&gt;
+&lt;value&gt;thrift://<varname>metastore_server_host</varname>:9083&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+&lt;name&gt;hive.metastore.client.socket.timeout&lt;/name&gt;
+&lt;value&gt;3600&lt;/value&gt;
+&lt;description&gt;MetaStore Client socket timeout in 
seconds&lt;/description&gt;
+&lt;/property&gt;</codeblock>
+      </li>
+
+      <li>
+        (Optional) If you installed the full Hive component on any host, you 
can verify that the metastore is
+        configured properly by starting the Hive console and querying for the 
list of available tables. Once you
+        confirm that the console starts, exit the console to continue the 
installation:
+<codeblock>$ hive
+Hive history file=/tmp/root/hive_job_log_root_201207272011_678722950.txt
+hive&gt; show tables;
+table1
+table2
+hive&gt; quit;
+$</codeblock>
+      </li>
+
+      <li>
+        Confirm that your package management command is aware of the Impala 
repository settings, as described in
+        <xref href="impala_prereqs.xml#prereqs"/>. (For CDH 4, this is a 
different repository than for CDH.) You
+        might need to download a repo or list file into a system directory 
underneath <filepath>/etc</filepath>.
+      </li>
+
+      <li>
+        Use <b>one</b> of the following sets of commands to install the Impala 
package:
+        <p>
+          <b>For RHEL, Oracle Linux, or CentOS systems:</b>
+        </p>
+<codeblock rev="1.2">$ sudo yum install impala             # Binaries for 
daemons
+$ sudo yum install impala-server      # Service start/stop script
+$ sudo yum install impala-state-store # Service start/stop script
+$ sudo yum install impala-catalog     # Service start/stop script
+</codeblock>
+        <p>
+          <b>For SUSE systems:</b>
+        </p>
+<codeblock rev="1.2">$ sudo zypper install impala             # Binaries for 
daemons
+$ sudo zypper install impala-server      # Service start/stop script
+$ sudo zypper install impala-state-store # Service start/stop script
+$ sudo zypper install impala-catalog     # Service start/stop script
+</codeblock>
+        <p>
+          <b>For Debian or Ubuntu systems:</b>
+        </p>
+<codeblock rev="1.2">$ sudo apt-get install impala             # Binaries for 
daemons
+$ sudo apt-get install impala-server      # Service start/stop script
+$ sudo apt-get install impala-state-store # Service start/stop script
+$ sudo apt-get install impala-catalog     # Service start/stop script
+</codeblock>
+        <note>
+          <ph rev="upstream">Cloudera</ph> recommends that you not install 
Impala on any HDFS NameNode. Installing Impala on NameNodes
+          provides no additional data locality, and executing queries with 
such a configuration might cause memory
+          contention and negatively impact the HDFS NameNode.
+        </note>
+      </li>
+
+      <li>
+        Copy the client <codeph>hive-site.xml</codeph>, 
<codeph>core-site.xml</codeph>,
+        <codeph>hdfs-site.xml</codeph>, and <codeph>hbase-site.xml</codeph> 
configuration files to the Impala
+        configuration directory, which defaults to 
<codeph>/etc/impala/conf</codeph>. Create this directory if it
+        does not already exist.
+      </li>
+
+      <li>
+        Use <b>one</b> of the following commands to install 
<codeph>impala-shell</codeph> on the machines from
+        which you want to issue queries. You can install 
<codeph>impala-shell</codeph> on any supported machine
+        that can connect to DataNodes that are running 
<codeph>impalad</codeph>.
+        <p>
+          <b>For RHEL/CentOS systems:</b>
+        </p>
+<codeblock>$ sudo yum install impala-shell</codeblock>
+        <p>
+          <b>For SUSE systems:</b>
+        </p>
+<codeblock>$ sudo zypper install impala-shell</codeblock>
+        <p>
+          <b>For Debian/Ubuntu systems:</b>
+        </p>
+<codeblock>$ sudo apt-get install impala-shell</codeblock>
+      </li>
+
+      <li>
+        Complete any required or recommended configuration, as described in
+        <xref href="impala_config_performance.xml#config_performance"/>. Some 
of these configuration changes are
+        mandatory. (They are applied automatically when you install using 
Cloudera Manager.)
+      </li>
+    </ol>
+
+    <p>
+      Once installation and configuration are complete, see <xref 
href="impala_processes.xml#processes"/> for how
+      to activate the software on the appropriate nodes in your cluster.
+    </p>
+
+    <p>
+      If this is your first time setting up and using Impala in this cluster, 
run through some of the exercises in
+      <xref href="impala_tutorial.xml#tutorial"/> to verify that you can do 
basic operations such as creating
+      tables and querying them.
+    </p>
+  </conbody>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_num_nodes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_num_nodes.xml b/docs/topics/impala_num_nodes.xml
new file mode 100644
index 0000000..8a8fe88
--- /dev/null
+++ b/docs/topics/impala_num_nodes.xml
@@ -0,0 +1,60 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="num_nodes">
+
+  <title>NUM_NODES Query Option</title>
+  <titlealts audience="PDF"><navtitle>NUM_NODES</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Troubleshooting"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">NUM_NODES query option</indexterm>
+      Limit the number of nodes that process a query, typically during 
debugging.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+<p>
+      <b>Allowed values:</b> Only accepts the values 0
+      (meaning all nodes) or 1 (meaning all work is done on the coordinator 
node).
+</p>
+
+    <p>
+      <b>Default:</b> 0
+    </p>
+
+     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+     <p>
+       If you are diagnosing a problem that you suspect is due to a timing 
issue due to
+       distributed query processing, you can set <codeph>NUM_NODES=1</codeph> 
to verify
+       if the problem still occurs when all the work is done on a single node.
+     </p>
+
+    <p conref="../shared/impala_common.xml#common/num_nodes_tip"/>
+
+    <note type="warning" rev="DOCS-1161">
+    <p>
+      Because this option results in increased resource utilization on a 
single host,
+      it could cause problems due to contention with other Impala statements or
+      high resource usage. Symptoms could include queries running slowly, 
exceeding the memory limit,
+      or appearing to hang. Use it only in a single-user development/test 
environment;
+      <b>do not</b> use it in a production environment or in a cluster with a 
high-concurrency
+      or high-volume or performance-critical workload.
+    </p>
+    </note>
+
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_num_scanner_threads.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_num_scanner_threads.xml 
b/docs/topics/impala_num_scanner_threads.xml
new file mode 100644
index 0000000..ec609d2
--- /dev/null
+++ b/docs/topics/impala_num_scanner_threads.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="num_scanner_threads">
+
+  <title>NUM_SCANNER_THREADS Query Option</title>
+  <titlealts 
audience="PDF"><navtitle>NUM_SCANNER_THREADS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">NUM_SCANNER_THREADS query 
option</indexterm>
+      Maximum number of scanner threads (on each node) used for each query. By 
default, Impala uses as many cores
+      as are available (one thread per core). You might lower this value if 
queries are using excessive resources
+      on a busy cluster. Impala imposes a maximum value automatically, so a 
high value has no practical effect.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 0
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_odbc.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_odbc.xml b/docs/topics/impala_odbc.xml
new file mode 100644
index 0000000..c73915e
--- /dev/null
+++ b/docs/topics/impala_odbc.xml
@@ -0,0 +1,203 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_odbc">
+
+  <title id="odbc">Configuring Impala to Work with ODBC</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="ODBC"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">ODBC</indexterm>
+      Third-party products can be designed to integrate with Impala using 
ODBC. For the best experience, ensure any
+      third-party product you intend to use is supported. Verifying support 
includes checking that the versions of
+      Impala, ODBC, the operating system, and the third-party product have all 
been approved for use together.
+      Before configuring your systems to use ODBC, download a connector.
+      You may need to sign in and accept license agreements before accessing 
the pages required for downloading
+      ODBC connectors.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+  <concept id="odbc_driver">
+    <title>Downloading the ODBC Driver</title>
+      <conbody>
+
+      <note type="important">
+        As of late 2015, most business intelligence applications are certified 
with the 2.x ODBC drivers.
+        Although the instructions on this page cover both the 2.x and 1.x 
drivers, expect to use the 2.x drivers
+        exclusively for most ODBC applications connecting to Impala.
+      </note>
+
+        <p>
+          See the
+          <xref 
href="http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html?productID=3mDesWddUS";
 scope="external" format="html">downloads
+          page</xref> for a matrix of the certified driver version for 
different products. See the
+          <xref 
href="http://www.cloudera.com/content/support/en/documentation/cloudera-connectors-documentation/connector-documentation-latest.html";
 scope="external" format="html">documentation
+          page</xref> for installation instructions.
+        </p>
+
+      </conbody>
+    </concept>
+
+    <concept id="odbc_port">
+      <title>Configuring the ODBC Port</title>
+      <conbody>
+        <p>
+          Versions 2.5 and 2.0 of the Cloudera ODBC Connector, currently 
certified for some but not all BI applications,
+          use the HiveServer2 protocol, corresponding to Impala port 21050. 
Impala supports Kerberos authentication with
+          all the supported versions of the driver, and requires ODBC 2.05.13 
for Impala or higher for LDAP username/password
+          authentication.
+        </p>
+
+        <p>
+          Version 1.x of the Cloudera ODBC Connector uses the original 
HiveServer1 protocol, corresponding to Impala
+          port 21000.
+        </p>
+
+      </conbody>
+    </concept>
+
+    <concept id="odbc_demo">
+      <title>Example of Setting Up an ODBC Application for Impala</title>
+      <conbody>
+
+        <p>
+          To illustrate the outline of the setup process, here is a transcript 
of a session to set up
+          all required drivers and a business intelligence application
+          that uses the ODBC driver, under Mac OS X. Each 
<codeph>.dmg</codeph> file runs a GUI-based
+          installer, first for the <xref 
href="http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/Downloads"; 
scope="external" format="html">underlying IODBC driver</xref> needed for 
non-Windows systems, then for
+          the Cloudera ODBC Connector, and finally for the BI tool itself.
+        </p>
+
+<codeblock>$ ls -1
+Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf
+BI_Tool_Installer.dmg
+iodbc-sdk-3.52.7-macosx-10.5.dmg
+ClouderaImpalaODBC.dmg
+$ open iodbc-sdk-3.52.7-macosx-10.dmg
+<varname>Install the IODBC driver using its installer</varname>
+$ open ClouderaImpalaODBC.dmg
+<varname>Install the Cloudera ODBC Connector using its installer</varname>
+$ installer_dir=$(pwd)
+$ cd /opt/cloudera/impalaodbc
+$ ls -1
+Cloudera ODBC Driver for Impala Install Guide.pdf
+Readme.txt
+Setup
+lib
+ErrorMessages
+Release Notes.txt
+Tools
+$ cd Setup
+$ ls
+odbc.ini    odbcinst.ini
+$ cp odbc.ini ~/.odbc.ini
+$ vi ~/.odbc.ini
+$ cat ~/.odbc.ini
+[ODBC]
+# Specify any global ODBC configuration here such as ODBC tracing.
+
+[ODBC Data Sources]
+Sample Cloudera Impala DSN=Cloudera ODBC Driver for Impala
+
+[Sample Cloudera Impala DSN]
+
+# Description: DSN Description.
+# This key is not necessary and is only to give a description of the data 
source.
+Description=Cloudera ODBC Driver for Impala DSN
+
+# Driver: The location where the ODBC driver is installed to.
+Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib
+
+# The DriverUnicodeEncoding setting is only used for SimbaDM
+# When set to 1, SimbaDM runs in UTF-16 mode.
+# When set to 2, SimbaDM runs in UTF-8 mode.
+#DriverUnicodeEncoding=2
+
+# Values for HOST, PORT, KrbFQDN, and KrbServiceName should be set here.
+# They can also be specified on the connection string.
+HOST=hostname.sample.example.com
+PORT=21050
+Schema=default
+
+# The authentication mechanism.
+# 0 - No authentication (NOSASL)
+# 1 - Kerberos authentication (SASL)
+# 2 - Username authentication (SASL)
+# 3 - Username/password authentication (SASL)
+# 4 - Username/password authentication with SSL (SASL)
+# 5 - No authentication with SSL (NOSASL)
+# 6 - Username/password authentication (NOSASL)
+AuthMech=0
+
+# Kerberos related settings.
+KrbFQDN=
+KrbRealm=
+KrbServiceName=
+
+# Username/password authentication with SSL settings.
+UID=
+PWD
+CAIssuedCertNamesMismatch=1
+TrustedCerts=/opt/cloudera/impalaodbc/lib/universal/cacerts.pem
+
+# Specify the proxy user ID to use.
+#DelegationUID=
+
+# General settings
+TSaslTransportBufSize=1000
+RowsFetchedPerBlock=10000
+SocketTimeout=0
+StringColumnLength=32767
+UseNativeQuery=0
+$ pwd
+/opt/cloudera/impalaodbc/Setup
+$ cd $installer_dir
+$ open BI_Tool_Installer.dmg
+<varname>Install the BI tool using its installer</varname>
+$ ls /Applications | grep BI_Tool
+BI_Tool.app
+$ open -a BI_Tool.app
+<varname>In the BI tool, connect to a data source using port 21050</varname>
+</codeblock>
+
+    </conbody>
+  </concept>
+
+  <concept rev="2.3.0" id="jdbc_odbc_notes">
+    <title>Notes about JDBC and ODBC Interaction with Impala SQL 
Features</title>
+    <conbody>
+
+      <p>
+        Most Impala SQL features work equivalently through the 
<cmdname>impala-shell</cmdname> interpreter
+        of the JDBC or ODBC APIs. The following are some exceptions to keep in 
mind when switching between
+        the interactive shell and applications using the APIs:
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
+
+      <ul>
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/>
+        </li>
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/>
+        </li>
+      </ul>
+
+    </conbody>
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_offset.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_offset.xml b/docs/topics/impala_offset.xml
new file mode 100644
index 0000000..534f6b6
--- /dev/null
+++ b/docs/topics/impala_offset.xml
@@ -0,0 +1,66 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2.1" id="offset">
+
+  <title>OFFSET Clause</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The <codeph>OFFSET</codeph> clause in a <codeph>SELECT</codeph> query 
causes the result set to start some
+      number of rows after the logical first item. The result set is numbered 
starting from zero, so <codeph>OFFSET
+      0</codeph> produces the same result as leaving out the 
<codeph>OFFSET</codeph> clause. Always use this clause
+      in combination with <codeph>ORDER BY</codeph> (so that it is clear which 
item should be first, second, and so
+      on) and <codeph>LIMIT</codeph> (so that the result set covers a bounded 
range, such as items 0-9, 100-199,
+      and so on).
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/limit_and_offset"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you could run a <q>paging</q> query 
originally written for a traditional
+      database application. Because typical Impala queries process megabytes 
or gigabytes of data and read large
+      data files from disk each time, it is inefficient to run a separate 
query to retrieve each small group of
+      items. Use this technique only for compatibility while porting older 
applications, then rewrite the
+      application code to use a single query with a large result set, and 
display pages of results from the cached
+      result set.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers select x from very_long_sequence;
+Inserted 1000000 rows in 1.34s
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 0;
++----+
+| x  |
++----+
+| 1  |
+| 2  |
+| 3  |
+| 4  |
+| 5  |
++----+
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 5;
++----+
+| x  |
++----+
+| 6  |
+| 7  |
+| 8  |
+| 9  |
+| 10 |
++----+
+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_operators.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_operators.xml b/docs/topics/impala_operators.xml
new file mode 100644
index 0000000..acd9297
--- /dev/null
+++ b/docs/topics/impala_operators.xml
@@ -0,0 +1,1592 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="operators">
+
+  <title>SQL Operators</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">operators</indexterm>
+      SQL operators are a class of comparison functions that are widely used 
within the <codeph>WHERE</codeph> clauses of
+      <codeph>SELECT</codeph> statements.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+  <concept rev="1.4.0" id="arithmetic_operators">
+
+    <title>Arithmetic Operators</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">arithmetic operators</indexterm>
+        The arithmetic operators use expressions with a left-hand argument, 
the operator, and then (in most cases) a right-hand argument.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>left_hand_arg</varname> <varname>binary_operator</varname> 
<varname>right_hand_arg</varname>
+<varname>unary_operator</varname> <varname>single_arg</varname>
+</codeblock>
+
+      <ul>
+        <li>
+          <codeph>+</codeph> and <codeph>-</codeph>: Can be used either as 
unary or binary operators.
+          <ul>
+            <li>
+              <p>
+                With unary notation, such as <codeph>+5</codeph>, 
<codeph>-2.5</codeph>, or <codeph>-<varname>col_name</varname></codeph>,
+                they multiply their single numeric argument by 
<codeph>+1</codeph> or <codeph>-1</codeph>. Therefore, unary
+                <codeph>+</codeph> returns its argument unchanged, while unary 
<codeph>-</codeph> flips the sign of its argument. Although
+                you can double up these operators in expressions such as 
<codeph>++5</codeph> (always positive) or <codeph>-+2</codeph> or
+                <codeph>+-2</codeph> (both always negative), you cannot double 
the unary minus operator because <codeph>--</codeph> is
+                interpreted as the start of a comment. (You can use a double 
unary minus operator if you separate the <codeph>-</codeph>
+                characters, for example with a space or parentheses.)
+              </p>
+            </li>
+
+            <li>
+              <p>
+                With binary notation, such as <codeph>2+2</codeph>, 
<codeph>5-2.5</codeph>, or <codeph><varname>col1</varname> +
+                <varname>col2</varname></codeph>, they add or subtract 
respectively the right-hand argument to (or from) the left-hand
+                argument. Both arguments must be of numeric types.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li>
+          <p>
+            <codeph>*</codeph> and <codeph>/</codeph>: Multiplication and 
division respectively. Both arguments must be of numeric types.
+          </p>
+
+          <p>
+            When multiplying, the shorter argument is promoted if necessary 
(such as <codeph>SMALLINT</codeph> to <codeph>INT</codeph> or
+            <codeph>BIGINT</codeph>, or <codeph>FLOAT</codeph> to 
<codeph>DOUBLE</codeph>), and then the result is promoted again to the
+            next larger type. Thus, multiplying a <codeph>TINYINT</codeph> and 
an <codeph>INT</codeph> produces a <codeph>BIGINT</codeph>
+            result. Multiplying a <codeph>FLOAT</codeph> and a 
<codeph>FLOAT</codeph> produces a <codeph>DOUBLE</codeph> result. Multiplying
+            a <codeph>FLOAT</codeph> and a <codeph>DOUBLE</codeph> or a 
<codeph>DOUBLE</codeph> and a <codeph>DOUBLE</codeph> produces a
+            <codeph>DECIMAL(38,17)</codeph>, because <codeph>DECIMAL</codeph> 
values can represent much larger and more precise values than
+            <codeph>DOUBLE</codeph>.
+          </p>
+
+          <p>
+            When dividing, Impala always treats the arguments and result as 
<codeph>DOUBLE</codeph> values to avoid losing precision. If you
+            need to insert the results of a division operation into a 
<codeph>FLOAT</codeph> column, use the <codeph>CAST()</codeph>
+            function to convert the result to the correct type.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <codeph>%</codeph>: Modulo operator. Returns the remainder of the 
left-hand argument divided by the right-hand argument. Both
+            arguments must be of one of the integer types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <codeph>&amp;</codeph>, <codeph>|</codeph>, <codeph>~</codeph>, 
and <codeph>^</codeph>: Bitwise operators that return the
+            logical AND, logical OR, <codeph>NOT</codeph>, or logical XOR 
(exclusive OR) of their argument values. Both arguments must be of
+            one of the integer types. If the arguments are of different type, 
the argument with the smaller type is implicitly extended to
+            match the argument with the longer type.
+          </p>
+        </li>
+      </ul>
+
+      <p>
+        You can chain a sequence of arithmetic expressions, optionally 
grouping them with parentheses.
+      </p>
+
+      <p>
+        The arithmetic operators generally do not have equivalent calling 
conventions using functional notation. For example, prior to
+        Impala 2.2.0 / CDH 5.4.0, there is no <codeph>MOD()</codeph> function 
equivalent to the <codeph>%</codeph> modulo operator.
+        Conversely, there are some arithmetic functions that do not have a 
corresponding operator. For example, for exponentiation you use
+        the <codeph>POW()</codeph> function, but there is no 
<codeph>**</codeph> exponentiation operator. See
+        <xref href="impala_math_functions.xml#math_functions"/> for the 
arithmetic functions you can use.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+      <p rev="2.3.0">
+        The following example shows how to do an arithmetic operation using a 
numeric field of a <codeph>STRUCT</codeph> type that is an
+        item within an <codeph>ARRAY</codeph> column. Once the scalar numeric 
value <codeph>R_NATIONKEY</codeph> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<codeblock rev="2.3.0">
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey * 10
+  from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5;
++-------------+-------------+------------------------------+
+| r_name      | item.n_name | nation.item.n_nationkey * 10 |
++-------------+-------------+------------------------------+
+| AMERICA     | CANADA      | 30                           |
+| AMERICA     | BRAZIL      | 20                           |
+| AMERICA     | ARGENTINA   | 10                           |
+| MIDDLE EAST | EGYPT       | 40                           |
+| AFRICA      | ALGERIA     | 0                            |
++-------------+-------------+------------------------------+
+</codeblock>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="between">
+
+    <title>BETWEEN Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">BETWEEN operator</indexterm>
+        In a <codeph>WHERE</codeph> clause, compares an expression to both a 
lower and upper bound. The comparison is successful is the
+        expression is greater than or equal to the lower bound, and less than 
or equal to the upper bound. If the bound values are switched,
+        so the lower bound is greater than the upper bound, does not match any 
values.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>expression</varname> BETWEEN 
<varname>lower_bound</varname> AND <varname>upper_bound</varname></codeblock>
+
+      <p>
+        <b>Data types:</b> Typically used with numeric data types. Works with 
any data type, although not very practical for
+        <codeph>BOOLEAN</codeph> values. (<codeph>BETWEEN false AND 
true</codeph> will match all <codeph>BOOLEAN</codeph> values.) Use
+        <codeph>CAST()</codeph> if necessary to ensure the lower and upper 
bound values are compatible types. Call string or date/time
+        functions if necessary to extract or transform the relevant portion to 
compare, especially if the value can be transformed into a
+        number.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p>
+        Be careful when using short string operands. A longer string that 
starts with the upper bound value will not be included, because it
+        is considered greater than the upper bound. For example, 
<codeph>BETWEEN 'A' and 'M'</codeph> would not match the string value
+        <codeph>'Midway'</codeph>. Use functions such as 
<codeph>upper()</codeph>, <codeph>lower()</codeph>, <codeph>substr()</codeph>,
+        <codeph>trim()</codeph>, and so on if necessary to ensure the 
comparison works as expected.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>-- Retrieve data for January through June, inclusive.
+select c1 from t1 where month <b>between 1 and 6</b>;
+
+-- Retrieve data for names beginning with 'A' through 'M' inclusive.
+-- Only test the first letter to ensure all the values starting with 'M' are 
matched.
+-- Do a case-insensitive comparison to match names with various capitalization 
conventions.
+select last_name from customers where upper(substr(last_name,1,1)) <b>between 
'A' and 'M'</b>;
+
+-- Retrieve data for only the first week of each month.
+select count(distinct visitor_id)) from web_traffic where 
dayofmonth(when_viewed) <b>between 1 and 7</b>;</codeblock>
+
+      <p rev="2.3.0">
+        The following example shows how to do a <codeph>BETWEEN</codeph> 
comparison using a numeric field of a <codeph>STRUCT</codeph> type
+        that is an item within an <codeph>ARRAY</codeph> column. Once the 
scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it
+        can be used in a comparison operator:
+      </p>
+
+<codeblock rev="2.3.0">
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey between 3 and 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ETHIOPIA    | 5                |
++-------------+-------------+------------------+
+</codeblock>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="comparison_operators">
+
+    <title>Comparison Operators</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">comparison operators</indexterm>
+        Impala supports the familiar comparison operators for checking 
equality and sort order for the column data types:
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>left_hand_expression</varname> 
<varname>comparison_operator</varname> 
<varname>right_hand_expression</varname></codeblock>
+
+      <ul>
+        <li>
+          <codeph>=</codeph>, <codeph>!=</codeph>, <codeph>&lt;&gt;</codeph>: 
apply to all types.
+        </li>
+
+        <li>
+          <codeph>&lt;</codeph>, <codeph>&lt;=</codeph>, 
<codeph>&gt;</codeph>, <codeph>&gt;=</codeph>: apply to all types; for
+          <codeph>BOOLEAN</codeph>, <codeph>TRUE</codeph> is considered 
greater than <codeph>FALSE</codeph>.
+        </li>
+      </ul>
+
+      <p>
+        <b>Alternatives:</b>
+      </p>
+
+      <p>
+        The <codeph>IN</codeph> and <codeph>BETWEEN</codeph> operators provide 
shorthand notation for expressing combinations of equality,
+        less than, and greater than comparisons with a single operator.
+      </p>
+
+      <p>
+        Because comparing any value to <codeph>NULL</codeph> produces 
<codeph>NULL</codeph> rather than <codeph>TRUE</codeph> or
+        <codeph>FALSE</codeph>, use the <codeph>IS NULL</codeph> and 
<codeph>IS NOT NULL</codeph> operators to check if a value is
+        <codeph>NULL</codeph> or not.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+      <p rev="2.3.0">
+        The following example shows how to do an arithmetic operation using a 
numeric field of a <codeph>STRUCT</codeph> type that is an
+        item within an <codeph>ARRAY</codeph> column. Once the scalar numeric 
value <codeph>R_NATIONKEY</codeph> is extracted, it can be
+        used with a comparison operator such as <codeph>&lt;</codeph>:
+      </p>
+
+<codeblock rev="2.3.0">
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| AMERICA     | BRAZIL      | 2                |
+| AMERICA     | ARGENTINA   | 1                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ALGERIA     | 0                |
++-------------+-------------+------------------+
+</codeblock>
+
+    </conbody>
+
+  </concept>
+
+  <concept audience="Cloudera" rev="2.1.0" id="except">
+
+    <title>EXCEPT Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">EXCEPT operator</indexterm>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.0.0" id="exists">
+
+    <title>EXISTS Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">EXISTS operator</indexterm>
+
+        <indexterm audience="Cloudera">NOT EXISTS operator</indexterm>
+        The <codeph>EXISTS</codeph> operator tests whether a subquery returns 
any results. You typically use it to find values from one
+        table that have corresponding values in another table.
+      </p>
+
+      <p>
+        The converse, <codeph>NOT EXISTS</codeph>, helps to find all the 
values from one table that do not have any corresponding values in
+        another table.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>EXISTS (<varname>subquery</varname>)
+NOT EXISTS (<varname>subquery</varname>)
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p>
+        The subquery can refer to a different table than the outer query 
block, or the same table. For example, you might use
+        <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> to check the 
existence of parent/child relationships between two columns of
+        the same table.
+      </p>
+
+      <p>
+        You can also use operators and function calls within the subquery to 
test for other kinds of relationships other than strict
+        equality. For example, you might use a call to 
<codeph>COUNT()</codeph> in the subquery to check whether the number of matching
+        values is higher or lower than some limit. You might call a UDF in the 
subquery to check whether values in one table matches a
+        hashed representation of those same values in a different table.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/null_blurb"/>
+
+      <p>
+        If the subquery returns any value at all (even <codeph>NULL</codeph>), 
<codeph>EXISTS</codeph> returns <codeph>TRUE</codeph> and
+        <codeph>NOT EXISTS</codeph> returns false.
+      </p>
+
+      <p>
+        The following example shows how even when the subquery returns only 
<codeph>NULL</codeph> values, <codeph>EXISTS</codeph> still
+        returns <codeph>TRUE</codeph> and thus matches all the rows from the 
table in the outer query block.
+      </p>
+
+<codeblock>[localhost:21000] &gt; create table all_nulls (x int);
+[localhost:21000] &gt; insert into all_nulls values (null), (null), (null);
+[localhost:21000] &gt; select y from t2 where exists (select x from all_nulls);
++---+
+| y |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</codeblock>
+
+      <p>
+        However, if the table in the subquery is empty and so the subquery 
returns an empty result set, <codeph>EXISTS</codeph> returns
+        <codeph>FALSE</codeph>:
+      </p>
+
+<codeblock>[localhost:21000] &gt; create table empty (x int);
+[localhost:21000] &gt; select y from t2 where exists (select x from empty);
+[localhost:21000] &gt;
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/added_in_20"/>
+
+      <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+      <p conref="../shared/impala_common.xml#common/subquery_no_limit"/>
+
+      <p rev="IMPALA-3232">
+        Prior to <keyword keyref="impala26_full"/>,
+        the <codeph>NOT EXISTS</codeph> operator required a correlated 
subquery.
+        In <keyword keyref="impala26_full"/> and higher, <codeph>NOT 
EXISTS</codeph> works with
+        uncorrelated queries also.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+<!-- To do: construct an EXISTS / NOT EXISTS example for complex types. -->
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+<!-- Maybe turn this into a conref if the same set of tables gets used for 
subqueries, EXISTS, other places. -->
+<!-- Yes, the material was reused under Subqueries for anti-joins. -->
+        The following examples refer to these simple tables containing small 
sets of integers or strings:
+<codeblock>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+
+[localhost:21000] &gt; create table month_names (m string);
+[localhost:21000] &gt; insert into month_names values
+                  &gt; ('January'), ('February'), ('March'),
+                  &gt; ('April'), ('May'), ('June'), ('July'),
+                  &gt; ('August'), ('September'), ('October'),
+                  &gt; ('November'), ('December');
+</codeblock>
+      </p>
+
+      <p>
+        The following example shows a correlated subquery that finds all the 
values in one table that exist in another table. For each value
+        <codeph>X</codeph> from <codeph>T1</codeph>, the query checks if the 
<codeph>Y</codeph> column of <codeph>T2</codeph> contains an
+        identical value, and the <codeph>EXISTS</codeph> operator returns 
<codeph>TRUE</codeph> or <codeph>FALSE</codeph> as appropriate in
+        each case.
+      </p>
+
+<codeblock>localhost:21000] &gt; select x from t1 where exists (select y from 
t2 where t1.x = y);
++---+
+| x |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</codeblock>
+
+      <p>
+        An uncorrelated query is less interesting in this case. Because the 
subquery always returns <codeph>TRUE</codeph>, all rows from
+        <codeph>T1</codeph> are returned. If the table contents where changed 
so that the subquery did not match any rows, none of the rows
+        from <codeph>T1</codeph> would be returned.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select x from t1 where exists (select y from 
t2 where y &gt; 5);
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
++---+
+</codeblock>
+
+      <p>
+        The following example shows how an uncorrelated subquery can test for 
the existence of some condition within a table. By using
+        <codeph>LIMIT 1</codeph> or an aggregate function, the query returns a 
single result or no result based on whether the subquery
+        matches any rows. Here, we know that <codeph>T1</codeph> and 
<codeph>T2</codeph> contain some even numbers, but <codeph>T3</codeph>
+        does not.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select "contains an even number" from t1 
where exists (select x from t1 where x % 2 = 0) limit 1;
++---------------------------+
+| 'contains an even number' |
++---------------------------+
+| contains an even number   |
++---------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t1 
where exists (select x from t1 where x % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 
where exists (select x from t2 where y % 2 = 0) limit 1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x'
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 
where exists (select y from t2 where y % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t3 
where exists (select z from t3 where z % 2 = 0) limit 1;
+[localhost:21000] &gt;
+</codeblock>
+
+      <p>
+        The following example finds numbers in one table that are 1 greater 
than numbers from another table. The <codeph>EXISTS</codeph>
+        notation is simpler than an equivalent <codeph>CROSS JOIN</codeph> 
between the tables. (The example then also illustrates how the
+        same test could be performed using an <codeph>IN</codeph> operator.)
+      </p>
+
+<codeblock>[localhost:21000] &gt; select x from t1 where exists (select y from 
t2 where x = y + 1);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+[localhost:21000] &gt; select x from t1 where x in (select y + 1 from t2);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+</codeblock>
+
+      <p>
+        The following example finds values from one table that do not exist in 
another table.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select x from t1 where not exists (select y 
from t2 where x = y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</codeblock>
+
+      <p>
+        The following example uses the <codeph>NOT EXISTS</codeph> operator to 
find all the leaf nodes in tree-structured data. This
+        simplified <q>tree of life</q> has multiple levels (class, order, 
family, and so on), with each item pointing upward through a
+        <codeph>PARENT</codeph> pointer. The example runs an outer query and a 
subquery on the same table, returning only those items whose
+        <codeph>ID</codeph> value is <i>not</i> referenced by the 
<codeph>PARENT</codeph> of any other item.
+      </p>
+
+<codeblock>[localhost:21000] &gt; create table tree (id int, parent int, name 
string);
+[localhost:21000] &gt; insert overwrite tree values
+                  &gt; (0, null, "animals"),
+                  &gt; (1, 0, "placentals"),
+                  &gt; (2, 0, "marsupials"),
+                  &gt; (3, 1, "bats"),
+                  &gt; (4, 1, "cats"),
+                  &gt; (5, 2, "kangaroos"),
+                  &gt; (6, 4, "lions"),
+                  &gt; (7, 4, "tigers"),
+                  &gt; (8, 5, "red kangaroo"),
+                  &gt; (9, 2, "wallabies");
+[localhost:21000] &gt; select name as "leaf node" from tree one
+                  &gt; where not exists (select parent from tree two where 
one.id = two.parent);
++--------------+
+| leaf node    |
++--------------+
+| bats         |
+| lions        |
+| tigers       |
+| red kangaroo |
+| wallabies    |
++--------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_subqueries.xml#subqueries"/>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.4.0" id="ilike">
+
+    <title>ILIKE Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">ILIKE operator</indexterm>
+        A case-insensitive comparison operator for <codeph>STRING</codeph> 
data, with basic wildcard capability using <codeph>_</codeph> to match a single
+        character and <codeph>%</codeph> to match multiple characters. The 
argument expression must match the entire string value.
+        Typically, it is more efficient to put any <codeph>%</codeph> wildcard 
match at the end of the string.
+      </p>
+
+      <p>
+        This operator, available in <keyword keyref="impala25_full"/> and 
higher, is the equivalent of the <codeph>LIKE</codeph> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>string_expression</varname> ILIKE 
<varname>wildcard_expression</varname>
+<varname>string_expression</varname> NOT ILIKE 
<varname>wildcard_expression</varname>
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+<!-- To do: construct a LIKE example for complex types. -->
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+      <p>
+        In the following examples, strings that are the same except for 
differences in uppercase
+        and lowercase match successfully with <codeph>ILIKE</codeph>, but do 
not match
+        with <codeph>LIKE</codeph>:
+      </p>
+
+<codeblock>select 'fooBar' ilike 'FOOBAR';
++-------------------------+
+| 'foobar' ilike 'foobar' |
++-------------------------+
+| true                    |
++-------------------------+
+
+select 'fooBar' like 'FOOBAR';
++------------------------+
+| 'foobar' like 'foobar' |
++------------------------+
+| false                  |
++------------------------+
+
+select 'FOOBAR' ilike 'f%';
++---------------------+
+| 'foobar' ilike 'f%' |
++---------------------+
+| true                |
++---------------------+
+
+select 'FOOBAR' like 'f%';
++--------------------+
+| 'foobar' like 'f%' |
++--------------------+
+| false              |
++--------------------+
+
+select 'ABCXYZ' not ilike 'ab_xyz';
++-----------------------------+
+| not 'abcxyz' ilike 'ab_xyz' |
++-----------------------------+
+| false                       |
++-----------------------------+
+
+select 'ABCXYZ' not like 'ab_xyz';
++----------------------------+
+| not 'abcxyz' like 'ab_xyz' |
++----------------------------+
+| true                       |
++----------------------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p rev="2.5.0">
+        For case-sensitive comparisons, see <xref 
href="impala_operators.xml#like"/>.
+        For a more general kind of search operator using regular expressions, 
see <xref href="impala_operators.xml#regexp"/>
+        or its case-insensitive counterpart <xref 
href="impala_operators.xml#iregexp"/>.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="in">
+
+    <title>IN Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">IN operator</indexterm>
+
+        <indexterm audience="Cloudera">NOT IN operator</indexterm>
+        The <codeph>IN</codeph> operator compares an argument value to a set 
of values, and returns <codeph>TRUE</codeph> if the argument
+        matches any value in the set. The <codeph>NOT IN</codeph> operator 
reverses the comparison, and checks if the argument value is not
+        part of a set of values.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="2.0.0"><varname>expression</varname> IN 
(<varname>expression</varname> [, <varname>expression</varname>])
+<varname>expression</varname> IN (<varname>subquery</varname>)
+
+<varname>expression</varname> NOT IN (<varname>expression</varname> [, 
<varname>expression</varname>])
+<varname>expression</varname> NOT IN (<varname>subquery</varname>)
+</codeblock>
+
+      <p>
+        The left-hand expression and the set of comparison values must be of 
compatible types.
+      </p>
+
+      <p>
+        The left-hand expression must consist only of a single value, not a 
tuple. Although the left-hand expression is typically a column
+        name, it could also be some other value. For example, the 
<codeph>WHERE</codeph> clauses <codeph>WHERE id IN (5)</codeph> and
+        <codeph>WHERE 5 IN (id)</codeph> produce the same results.
+      </p>
+
+      <p rev="CDH-40407">
+        The set of values to check against can be specified as constants, 
function calls, column names, or other expressions in the query
+        text. The maximum number of expressions in the <codeph>IN</codeph> 
list is 9999. (The maximum number of elements of
+        a single expression is 10,000 items, and the <codeph>IN</codeph> 
operator itself counts as one.)
+      </p>
+
+      <p rev="2.0.0">
+        In Impala 2.0 and higher, the set of values can also be generated by a 
subquery. <codeph>IN</codeph> can evaluate an unlimited
+        number of results using a subquery.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p>
+        Any expression using the <codeph>IN</codeph> operator could be 
rewritten as a series of equality tests connected with
+        <codeph>OR</codeph>, but the <codeph>IN</codeph> syntax is often 
clearer, more concise, and easier for Impala to optimize. For
+        example, with partitioned tables, queries frequently use 
<codeph>IN</codeph> clauses to filter data by comparing the partition key
+        columns to specific values.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/null_blurb"/>
+
+      <p>
+        If there really is a matching non-null value, <codeph>IN</codeph> 
returns <codeph>TRUE</codeph>:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select 1 in (1,null,2,3);
++----------------------+
+| 1 in (1, null, 2, 3) |
++----------------------+
+| true                 |
++----------------------+
+[localhost:21000] &gt; select 1 not in (1,null,2,3);
++--------------------------+
+| 1 not in (1, null, 2, 3) |
++--------------------------+
+| false                    |
++--------------------------+
+</codeblock>
+
+      <p>
+        If the searched value is not found in the comparison values, and the 
comparison values include <codeph>NULL</codeph>, the result is
+        <codeph>NULL</codeph>:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select 5 in (1,null,2,3);
++----------------------+
+| 5 in (1, null, 2, 3) |
++----------------------+
+| NULL                 |
++----------------------+
+[localhost:21000] &gt; select 5 not in (1,null,2,3);
++--------------------------+
+| 5 not in (1, null, 2, 3) |
++--------------------------+
+| NULL                     |
++--------------------------+
+[localhost:21000] &gt; select 1 in (null);
++-------------+
+| 1 in (null) |
++-------------+
+| NULL        |
++-------------+
+[localhost:21000] &gt; select 1 not in (null);
++-----------------+
+| 1 not in (null) |
++-----------------+
+| NULL            |
++-----------------+
+</codeblock>
+
+      <p>
+        If the left-hand argument is <codeph>NULL</codeph>, 
<codeph>IN</codeph> always returns <codeph>NULL</codeph>. This rule applies even
+        if the comparison values include <codeph>NULL</codeph>.
+      </p>
+
+<codeblock>[localhost:21000] &gt; select null in (1,2,3);
++-------------------+
+| null in (1, 2, 3) |
++-------------------+
+| NULL              |
++-------------------+
+[localhost:21000] &gt; select null not in (1,2,3);
++-----------------------+
+| null not in (1, 2, 3) |
++-----------------------+
+| NULL                  |
++-----------------------+
+[localhost:21000] &gt; select null in (null);
++----------------+
+| null in (null) |
++----------------+
+| NULL           |
++----------------+
+[localhost:21000] &gt; select null not in (null);
++--------------------+
+| null not in (null) |
++--------------------+
+| NULL               |
++--------------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/enhanced_in_20"/>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+      <p rev="2.3.0">
+        The following example shows how to do an arithmetic operation using a 
numeric field of a <codeph>STRUCT</codeph> type that is an
+        item within an <codeph>ARRAY</codeph> column. Once the scalar numeric 
value <codeph>R_NATIONKEY</codeph> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<codeblock rev="2.3.0">
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey in (1,3,5)
++---------+-------------+------------------+
+| r_name  | item.n_name | item.n_nationkey |
++---------+-------------+------------------+
+| AMERICA | CANADA      | 3                |
+| AMERICA | ARGENTINA   | 1                |
+| AFRICA  | ETHIOPIA    | 5                |
++---------+-------------+------------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+      <p conref="../shared/impala_common.xml#common/subquery_no_limit"/>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>-- Using IN is concise and self-documenting.
+SELECT * FROM t1 WHERE c1 IN (1,2,10);
+-- Equivalent to series of = comparisons ORed together.
+SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10;
+
+SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN 
('A','E','I','O','U');
+
+SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN 
('January','June','July');</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_subqueries.xml#subqueries"/>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept audience="Cloudera" rev="2.1.0" id="intersect">
+
+    <title>INTERSECT Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">INTERSECT operator</indexterm>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.5.0" id="iregexp">
+
+    <title>IREGEXP Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">IREGEXP operator</indexterm>
+        Tests whether a value matches a regular expression, using 
case-insensitive string comparisons.
+        Uses the POSIX regular expression syntax where <codeph>^</codeph> and
+        <codeph>$</codeph> match the beginning and end of the string, 
<codeph>.</codeph> represents any single character, <codeph>*</codeph>
+        represents a sequence of zero or more items, <codeph>+</codeph> 
represents a sequence of one or more items, <codeph>?</codeph>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p>
+        This operator, available in <keyword keyref="impala25_full"/> and 
higher, is the equivalent of the <codeph>REGEXP</codeph> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>string_expression</varname> IREGEXP 
<varname>regular_expression</varname>
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/regular_expression_whole_string"/>
+
+<!-- Currently, there isn't any IRLIKE synonym, so REGEXP and IREGEXP are 
different in that respect.
+     I pinged IMPALA-1787 to check if that's intentional.
+      <p>
+        The <codeph>IRLIKE</codeph> operator is a synonym for 
<codeph>IREGEXP</codeph>.
+      </p>
+-->
+
+      <p rev="2.5.0">
+        The <codeph>|</codeph> symbol is the alternation operator, typically 
used within <codeph>()</codeph> to match different sequences.
+        The <codeph>()</codeph> groups do not allow backreferences. To 
retrieve the part of a value matched within a <codeph>()</codeph>
+        section, use the <codeph><xref 
href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph>
+        built-in function. (Currently, there is not any case-insensitive 
equivalent for the <codeph>regexp_extract()</codeph> function.)
+      </p>
+
+      <note rev="1.3.1">
+        <p rev="1.3.1" 
conref="../shared/impala_common.xml#common/regexp_matching"/>
+      </note>
+
+      <p conref="../shared/impala_common.xml#common/regexp_re2"/>
+
+      <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+<!-- To do: construct a REGEXP example for complex types. -->
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+        The following examples demonstrate the syntax for the 
<codeph>IREGEXP</codeph> operator.
+      </p>
+
+<codeblock>select 'abcABCaabbcc' iregexp '^[a-c]+$';
++---------------------------------+
+| 'abcabcaabbcc' iregexp '[a-c]+' |
++---------------------------------+
+| true                            |
++---------------------------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p>
+        <xref href="impala_operators.xml#regexp"/>
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.5.0 IMPALA-2147" id="is_distinct_from">
+
+    <title id="is_distinct">IS DISTINCT FROM Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">IS DISTINCT FROM operator</indexterm>
+
+        <indexterm audience="Cloudera">IS NOT DISTINCT FROM 
operator</indexterm>
+        The <codeph>IS DISTINCT FROM</codeph> operator, and its converse the 
<codeph>IS NOT DISTINCT FROM</codeph> operator, test whether or
+        not values are identical. <codeph>IS NOT DISTINCT FROM</codeph> is 
similar to the <codeph>=</codeph> operator, and <codeph>IS
+        DISTINCT FROM</codeph> is similar to the <codeph>!=</codeph> operator, 
except that <codeph>NULL</codeph> values are treated as
+        identical. Therefore, <codeph>IS NOT DISTINCT FROM</codeph> returns 
<codeph>true</codeph> rather than <codeph>NULL</codeph>, and
+        <codeph>IS DISTINCT FROM</codeph> returns <codeph>false</codeph> 
rather than <codeph>NULL</codeph>, when comparing two
+        <codeph>NULL</codeph> values. If one of the values being compared is 
<codeph>NULL</codeph> and the other is not, <codeph>IS DISTINCT
+        FROM</codeph> returns <codeph>true</codeph> and <codeph>IS NOT 
DISTINCT FROM</codeph> returns <codeph>false</codeph>, again instead
+        of returning <codeph>NULL</codeph> in both cases.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>expression1</varname> IS DISTINCT FROM 
<varname>expression2</varname>
+
+<varname>expression1</varname> IS NOT DISTINCT FROM 
<varname>expression2</varname>
+<varname>expression1</varname> &lt;=&gt; <varname>expression2</varname>
+</codeblock>
+
+      <p>
+        The operator <codeph>&lt;=&gt;</codeph> is an alias for <codeph>IS NOT 
DISTINCT FROM</codeph>.
+        It is typically used as a <codeph>NULL</codeph>-safe equality operator 
in join queries.
+        That is, <codeph>A &lt;=&gt; B</codeph> is true if <codeph>A</codeph> 
equals <codeph>B</codeph>
+        or if both <codeph>A</codeph> and <codeph>B</codeph> are 
<codeph>NULL</codeph>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p>
+        This operator provides concise notation for comparing two values and 
always producing a <codeph>true</codeph> or
+        <codeph>false</codeph> result, without treating <codeph>NULL</codeph> 
as a special case. Otherwise, to unambiguously distinguish
+        between two values requires a compound expression involving <codeph>IS 
[NOT] NULL</codeph> tests of both operands in addition to the
+        <codeph>=</codeph> or <codeph>!=</codeph> operator.
+      </p>
+
+      <p>
+        The <codeph>&lt;=&gt;</codeph> operator, used like an equality 
operator in a join query,
+        is more efficient than the equivalent clause: <codeph>A = B OR (A IS 
NULL AND B IS NULL)</codeph>.
+        The <codeph>&lt;=&gt;</codeph> operator can use a hash join, while the 
<codeph>OR</codeph> expression
+        cannot.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+        The following examples show how <codeph>IS DISTINCT FROM</codeph> 
gives output similar to
+        the <codeph>!=</codeph> operator, and <codeph>IS NOT DISTINCT 
FROM</codeph> gives output
+        similar to the <codeph>=</codeph> operator. The exception is when the 
expression involves
+        a <codeph>NULL</codeph> value on one side or both sides, where 
<codeph>!=</codeph> and
+        <codeph>=</codeph> return <codeph>NULL</codeph> but the <codeph>IS 
[NOT] DISTINCT FROM</codeph>
+        operators still return <codeph>true</codeph> or <codeph>false</codeph>.
+      </p>
+
+<codeblock>
+select 1 is distinct from 0, 1 != 0;
++----------------------+--------+
+| 1 is distinct from 0 | 1 != 0 |
++----------------------+--------+
+| true                 | true   |
++----------------------+--------+
+
+select 1 is distinct from 1, 1 != 1;
++----------------------+--------+
+| 1 is distinct from 1 | 1 != 1 |
++----------------------+--------+
+| false                | false  |
++----------------------+--------+
+
+select 1 is distinct from null, 1 != null;
++-------------------------+-----------+
+| 1 is distinct from null | 1 != null |
++-------------------------+-----------+
+| true                    | NULL      |
++-------------------------+-----------+
+
+select null is distinct from null, null != null;
++----------------------------+--------------+
+| null is distinct from null | null != null |
++----------------------------+--------------+
+| false                      | NULL         |
++----------------------------+--------------+
+
+select 1 is not distinct from 0, 1 = 0;
++--------------------------+-------+
+| 1 is not distinct from 0 | 1 = 0 |
++--------------------------+-------+
+| false                    | false |
++--------------------------+-------+
+
+select 1 is not distinct from 1, 1 = 1;
++--------------------------+-------+
+| 1 is not distinct from 1 | 1 = 1 |
++--------------------------+-------+
+| true                     | true  |
++--------------------------+-------+
+
+select 1 is not distinct from null, 1 = null;
++-----------------------------+----------+
+| 1 is not distinct from null | 1 = null |
++-----------------------------+----------+
+| false                       | NULL     |
++-----------------------------+----------+
+
+select null is not distinct from null, null = null;
++--------------------------------+-------------+
+| null is not distinct from null | null = null |
++--------------------------------+-------------+
+| true                           | NULL        |
++--------------------------------+-------------+
+</codeblock>
+
+      <p>
+        The following example shows how <codeph>IS DISTINCT FROM</codeph> 
considers
+        <codeph>CHAR</codeph> values to be the same (not distinct from each 
other)
+        if they only differ in the number of trailing spaces. Therefore, 
sometimes
+        the result of an <codeph>IS [NOT] DISTINCT FROM</codeph> operator 
differs
+        depending on whether the values are 
<codeph>STRING</codeph>/<codeph>VARCHAR</codeph>
+        or <codeph>CHAR</codeph>.
+      </p>
+
+<codeblock>
+select
+  'x' is distinct from 'x ' as string_with_trailing_spaces,
+  cast('x' as char(5)) is distinct from cast('x ' as char(5)) as 
char_with_trailing_spaces;
++-----------------------------+---------------------------+
+| string_with_trailing_spaces | char_with_trailing_spaces |
++-----------------------------+---------------------------+
+| true                        | false                     |
++-----------------------------+---------------------------+
+</codeblock>
+    </conbody>
+
+  </concept>
+
+  <concept id="is_null">
+
+    <title>IS NULL Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">IS NULL operator</indexterm>
+
+        <indexterm audience="Cloudera">IS NOT NULL operator</indexterm>
+        The <codeph>IS NULL</codeph> operator, and its converse the <codeph>IS 
NOT NULL</codeph> operator, test whether a specified value is
+        <codeph><xref href="impala_literals.xml#null">NULL</xref></codeph>. 
Because using <codeph>NULL</codeph> with any of the other
+        comparison operators such as <codeph>=</codeph> or <codeph>!=</codeph> 
also returns <codeph>NULL</codeph> rather than
+        <codeph>TRUE</codeph> or <codeph>FALSE</codeph>, you use a 
special-purpose comparison operator to check for this special condition.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>expression</varname> IS NULL
+<varname>expression</varname> IS NOT NULL
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p>
+        In many cases, <codeph>NULL</codeph> values indicate some incorrect or 
incomplete processing during data ingestion or conversion.
+        You might check whether any values in a column are 
<codeph>NULL</codeph>, and if so take some followup action to fill them in.
+      </p>
+
+      <p>
+        With sparse data, often represented in <q>wide</q> tables, it is 
common for most values to be <codeph>NULL</codeph> with only an
+        occasional non-<codeph>NULL</codeph> value. In those cases, you can 
use the <codeph>IS NOT NULL</codeph> operator to identify the
+        rows containing any data at all for a particular column, regardless of 
the actual value.
+      </p>
+
+      <p>
+        With a well-designed database schema, effective use of 
<codeph>NULL</codeph> values and <codeph>IS NULL</codeph> and <codeph>IS NOT
+        NULL</codeph> operators can save having to design custom logic around 
special values such as 0, -1, <codeph>'N/A'</codeph>, empty
+        string, and so on. <codeph>NULL</codeph> lets you distinguish between 
a value that is known to be 0, false, or empty, and a truly
+        unknown value.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p rev="2.3.0">
+        This operator is not applicable to complex type columns 
(<codeph>STRUCT</codeph>,
+        <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>). Using a complex type 
column
+        with <codeph>IS NULL</codeph> or <codeph>IS NOT NULL</codeph> causes a 
query error.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>-- If this value is non-zero, something is wrong.
+select count(*) from employees where employee_id is null;
+
+-- With data from disparate sources, some fields might be blank.
+-- Not necessarily an error condition.
+select count(*) from census where household_income is null;
+
+-- Sometimes we expect fields to be null, and followup action
+-- is needed when they are not.
+select count(*) from web_traffic where weird_http_code is not null;</codeblock>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="like">
+
+    <title>LIKE Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">LIKE operator</indexterm>
+        A comparison operator for <codeph>STRING</codeph> data, with basic 
wildcard capability using the underscore
+        (<codeph>_</codeph>) to match a single character and the percent sign 
(<codeph>%</codeph>) to match multiple
+        characters. The argument expression must match the entire string value.
+        Typically, it is more efficient to put any <codeph>%</codeph> wildcard 
match at the end of the string.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>string_expression</varname> LIKE 
<varname>wildcard_expression</varname>
+<varname>string_expression</varname> NOT LIKE 
<varname>wildcard_expression</varname>
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+<!-- To do: construct a LIKE example for complex types. -->
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>select distinct c_last_name from customer where c_last_name like 
'Mc%' or c_last_name like 'Mac%';
+select count(c_last_name) from customer where c_last_name like 'M%';
+select c_email_address from customer where c_email_address like '%.edu';
+
+-- We can find 4-letter names beginning with 'M' by calling functions...
+select distinct c_last_name from customer where length(c_last_name) = 4 and 
substr(c_last_name,1,1) = 'M';
+-- ...or in a more readable way by matching M followed by exactly 3 characters.
+select distinct c_last_name from customer where c_last_name like 
'M___';</codeblock>
+
+      <p rev="2.5.0">
+        For case-insensitive comparisons, see <xref 
href="impala_operators.xml#ilike"/>.
+        For a more general kind of search operator using regular expressions, 
see <xref href="impala_operators.xml#regexp"/>
+        or its case-insensitive counterpart <xref 
href="impala_operators.xml#iregexp"/>.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="logical_operators">
+
+    <title>Logical Operators</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">logical operators</indexterm>
+        Logical operators return a <codeph>BOOLEAN</codeph> value, based on a 
binary or unary logical operation between arguments that are
+        also Booleans. Typically, the argument expressions use <xref 
href="impala_operators.xml#comparison_operators">comparison
+        operators</xref>.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>boolean_expression</varname> 
<varname>binary_logical_operator</varname> <varname>boolean_expression</varname>
+<varname>unary_logical_operator</varname> <varname>boolean_expression</varname>
+</codeblock>
+
+      <p>
+        The Impala logical operators are:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>AND</codeph>: A binary operator that returns 
<codeph>true</codeph> if its left-hand and right-hand arguments both evaluate
+          to <codeph>true</codeph>, <codeph>NULL</codeph> if either argument 
is <codeph>NULL</codeph>, and <codeph>false</codeph> otherwise.
+        </li>
+
+        <li>
+          <codeph>OR</codeph>: A binary operator that returns 
<codeph>true</codeph> if either of its left-hand and right-hand arguments
+          evaluate to <codeph>true</codeph>, <codeph>NULL</codeph> if one 
argument is <codeph>NULL</codeph> and the other is either
+          <codeph>NULL</codeph> or <codeph>false</codeph>, and 
<codeph>false</codeph> otherwise.
+        </li>
+
+        <li>
+          <codeph>NOT</codeph>: A unary operator that flips the state of a 
Boolean expression from <codeph>true</codeph> to
+          <codeph>false</codeph>, or <codeph>false</codeph> to 
<codeph>true</codeph>. If the argument expression is <codeph>NULL</codeph>,
+          the result remains <codeph>NULL</codeph>. (When <codeph>NOT</codeph> 
is used this way as a unary logical operator, it works
+          differently than the <codeph>IS NOT NULL</codeph> comparison 
operator, which returns <codeph>true</codeph> when applied to a
+          <codeph>NULL</codeph>.)
+        </li>
+      </ul>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+      <p rev="2.3.0">
+        The following example shows how to do an arithmetic operation using a 
numeric field of a <codeph>STRUCT</codeph> type that is an
+        item within an <codeph>ARRAY</codeph> column. Once the scalar numeric 
value <codeph>R_NATIONKEY</codeph> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<codeblock rev="2.3.0">
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+  from region, region.r_nations as nation
+where
+  nation.item.n_nationkey between 3 and 5
+  or nation.item.n_nationkey &lt; 15;
++-------------+----------------+------------------+
+| r_name      | item.n_name    | item.n_nationkey |
++-------------+----------------+------------------+
+| EUROPE      | UNITED KINGDOM | 23               |
+| EUROPE      | RUSSIA         | 22               |
+| EUROPE      | ROMANIA        | 19               |
+| ASIA        | VIETNAM        | 21               |
+| ASIA        | CHINA          | 18               |
+| AMERICA     | UNITED STATES  | 24               |
+| AMERICA     | PERU           | 17               |
+| AMERICA     | CANADA         | 3                |
+| MIDDLE EAST | SAUDI ARABIA   | 20               |
+| MIDDLE EAST | EGYPT          | 4                |
+| AFRICA      | MOZAMBIQUE     | 16               |
+| AFRICA      | ETHIOPIA       | 5                |
++-------------+----------------+------------------+
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+        These examples demonstrate the <codeph>AND</codeph> operator:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select true and true;
++---------------+
+| true and true |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select true and false;
++----------------+
+| true and false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select false and false;
++-----------------+
+| false and false |
++-----------------+
+| false           |
++-----------------+
+[localhost:21000] &gt; select true and null;
++---------------+
+| true and null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (10 &gt; 2) and (6 != 9);
++-----------------------+
+| (10 &gt; 2) and (6 != 9) |
++-----------------------+
+| true                  |
++-----------------------+
+</codeblock>
+
+      <p>
+        These examples demonstrate the <codeph>OR</codeph> operator:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select true or true;
++--------------+
+| true or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select true or false;
++---------------+
+| true or false |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select false or false;
++----------------+
+| false or false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select true or null;
++--------------+
+| true or null |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select null or true;
++--------------+
+| null or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select false or null;
++---------------+
+| false or null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (1 = 1) or ('hello' = 'world');
++--------------------------------+
+| (1 = 1) or ('hello' = 'world') |
++--------------------------------+
+| true                           |
++--------------------------------+
+[localhost:21000] &gt; select (2 + 2 != 4) or (-1 &gt; 0);
++--------------------------+
+| (2 + 2 != 4) or (-1 &gt; 0) |
++--------------------------+
+| false                    |
++--------------------------+
+</codeblock>
+
+      <p>
+        These examples demonstrate the <codeph>NOT</codeph> operator:
+      </p>
+
+<codeblock>[localhost:21000] &gt; select not true;
++----------+
+| not true |
++----------+
+| false    |
++----------+
+[localhost:21000] &gt; select not false;
++-----------+
+| not false |
++-----------+
+| true      |
++-----------+
+[localhost:21000] &gt; select not null;
++----------+
+| not null |
++----------+
+| NULL     |
++----------+
+[localhost:21000] &gt; select not (1=1);
++-------------+
+| not (1 = 1) |
++-------------+
+| false       |
++-------------+
+</codeblock>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="regexp">
+
+    <title>REGEXP Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">REGEXP operator</indexterm>
+        Tests whether a value matches a regular expression. Uses the POSIX 
regular expression syntax where <codeph>^</codeph> and
+        <codeph>$</codeph> match the beginning and end of the string, 
<codeph>.</codeph> represents any single character, <codeph>*</codeph>
+        represents a sequence of zero or more items, <codeph>+</codeph> 
represents a sequence of one or more items, <codeph>?</codeph>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock><varname>string_expression</varname> REGEXP 
<varname>regular_expression</varname>
+</codeblock>
+
+      <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/regular_expression_whole_string"/>
+
+      <p>
+        The <codeph>RLIKE</codeph> operator is a synonym for 
<codeph>REGEXP</codeph>.
+      </p>
+
+      <p>
+        The <codeph>|</codeph> symbol is the alternation operator, typically 
used within <codeph>()</codeph> to match different sequences.
+        The <codeph>()</codeph> groups do not allow backreferences. To 
retrieve the part of a value matched within a <codeph>()</codeph>
+        section, use the <codeph><xref 
href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph>
+        built-in function.
+      </p>
+
+      <note rev="1.3.1">
+          <p rev="1.3.1" 
conref="../shared/impala_common.xml#common/regexp_matching"/>
+      </note>
+
+      <p conref="../shared/impala_common.xml#common/regexp_re2"/>
+
+      <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
+
+      <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+      <p 
conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/>
+
+<!-- To do: construct a REGEXP example for complex types. -->
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+        The following examples demonstrate the identical syntax for the 
<codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators.
+      </p>
+
+<!-- Same examples shown for both REGEXP and RLIKE operators. -->
+
+<codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/>
+
+      <p conref="../shared/impala_common.xml#common/related_info"/>
+
+      <p rev="2.5.0">
+        For regular expression matching with case-insensitive comparisons, see 
<xref href="impala_operators.xml#iregexp"/>.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="rlike">
+
+    <title>RLIKE Operator</title>
+
+    <conbody>
+
+      <p>
+        <indexterm audience="Cloudera">RLIKE operator</indexterm>
+        Synonym for the <codeph>REGEXP</codeph> operator. See <xref 
href="impala_operators.xml#regexp"/> for details.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+      <p>
+        The following examples demonstrate the identical syntax for the 
<codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators.
+      </p>
+
+<!-- Same examples shown for both REGEXP and RLIKE operators. -->
+
+<codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/>
+
+    </conbody>
+
+  </concept>
+
+</concept>

[17/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to