http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_noncm_installation.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_noncm_installation.xml b/docs/topics/impala_noncm_installation.xml new file mode 100644 index 0000000..d9c1f1a --- /dev/null +++ b/docs/topics/impala_noncm_installation.xml @@ -0,0 +1,171 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="noncm_installation"> + + <title>Installing Impala without Cloudera Manager</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Installing"/> + <data name="Category" value="Administrators"/> + </metadata> + </prolog> + + <conbody> + + <p> + Before installing Impala manually, make sure all applicable nodes have the appropriate hardware + configuration, levels of operating system and CDH, and any other software prerequisites. See + <xref href="impala_prereqs.xml#prereqs"/> for details. + </p> + + <p> + You can install Impala across many hosts or on one host: + </p> + + <ul> + <li> + Installing Impala across multiple machines creates a distributed configuration. For best performance, + install Impala on <b>all</b> DataNodes. + </li> + + <li> + Installing Impala on a single machine produces a pseudo-distributed cluster. + </li> + </ul> + + <p> + <b>To install Impala on a host:</b> + </p> + + <ol> + <li> + Install CDH as described in the Installation section of the +<!-- Original URL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/CDH5-Installation-Guide.html --> + <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/installation.html" scope="external" format="html">CDH + 5 Installation Guide</xref>. + </li> + + <li> + <p> + Install the Hive metastore somewhere in your cluster, as described in the Hive Installation topic in the +<!-- Original URL: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh_ig_hive_installation.html --> + <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_installation.html" scope="external" format="html">CDH + 5 Installation Guide</xref>. As part of this process, you configure the Hive metastore to use an external + database as a metastore. Impala uses this same database for its own table metadata. You can choose either + a MySQL or PostgreSQL database as the metastore. The process for configuring each type of database is + described in the CDH Installation Guide). + </p> + <p> + <ph rev="upstream">Cloudera</ph> recommends setting up a Hive metastore service rather than connecting directly to the metastore + database; this configuration is required when running Impala under CDH 4.1. Make sure the + <filepath>/etc/impala/conf/hive-site.xml</filepath> file contains the following setting, substituting the + appropriate hostname for <varname>metastore_server_host</varname>: + </p> +<codeblock><property> +<name>hive.metastore.uris</name> +<value>thrift://<varname>metastore_server_host</varname>:9083</value> +</property> +<property> +<name>hive.metastore.client.socket.timeout</name> +<value>3600</value> +<description>MetaStore Client socket timeout in seconds</description> +</property></codeblock> + </li> + + <li> + (Optional) If you installed the full Hive component on any host, you can verify that the metastore is + configured properly by starting the Hive console and querying for the list of available tables. Once you + confirm that the console starts, exit the console to continue the installation: +<codeblock>$ hive +Hive history file=/tmp/root/hive_job_log_root_201207272011_678722950.txt +hive> show tables; +table1 +table2 +hive> quit; +$</codeblock> + </li> + + <li> + Confirm that your package management command is aware of the Impala repository settings, as described in + <xref href="impala_prereqs.xml#prereqs"/>. (For CDH 4, this is a different repository than for CDH.) You + might need to download a repo or list file into a system directory underneath <filepath>/etc</filepath>. + </li> + + <li> + Use <b>one</b> of the following sets of commands to install the Impala package: + <p> + <b>For RHEL, Oracle Linux, or CentOS systems:</b> + </p> +<codeblock rev="1.2">$ sudo yum install impala # Binaries for daemons +$ sudo yum install impala-server # Service start/stop script +$ sudo yum install impala-state-store # Service start/stop script +$ sudo yum install impala-catalog # Service start/stop script +</codeblock> + <p> + <b>For SUSE systems:</b> + </p> +<codeblock rev="1.2">$ sudo zypper install impala # Binaries for daemons +$ sudo zypper install impala-server # Service start/stop script +$ sudo zypper install impala-state-store # Service start/stop script +$ sudo zypper install impala-catalog # Service start/stop script +</codeblock> + <p> + <b>For Debian or Ubuntu systems:</b> + </p> +<codeblock rev="1.2">$ sudo apt-get install impala # Binaries for daemons +$ sudo apt-get install impala-server # Service start/stop script +$ sudo apt-get install impala-state-store # Service start/stop script +$ sudo apt-get install impala-catalog # Service start/stop script +</codeblock> + <note> + <ph rev="upstream">Cloudera</ph> recommends that you not install Impala on any HDFS NameNode. Installing Impala on NameNodes + provides no additional data locality, and executing queries with such a configuration might cause memory + contention and negatively impact the HDFS NameNode. + </note> + </li> + + <li> + Copy the client <codeph>hive-site.xml</codeph>, <codeph>core-site.xml</codeph>, + <codeph>hdfs-site.xml</codeph>, and <codeph>hbase-site.xml</codeph> configuration files to the Impala + configuration directory, which defaults to <codeph>/etc/impala/conf</codeph>. Create this directory if it + does not already exist. + </li> + + <li> + Use <b>one</b> of the following commands to install <codeph>impala-shell</codeph> on the machines from + which you want to issue queries. You can install <codeph>impala-shell</codeph> on any supported machine + that can connect to DataNodes that are running <codeph>impalad</codeph>. + <p> + <b>For RHEL/CentOS systems:</b> + </p> +<codeblock>$ sudo yum install impala-shell</codeblock> + <p> + <b>For SUSE systems:</b> + </p> +<codeblock>$ sudo zypper install impala-shell</codeblock> + <p> + <b>For Debian/Ubuntu systems:</b> + </p> +<codeblock>$ sudo apt-get install impala-shell</codeblock> + </li> + + <li> + Complete any required or recommended configuration, as described in + <xref href="impala_config_performance.xml#config_performance"/>. Some of these configuration changes are + mandatory. (They are applied automatically when you install using Cloudera Manager.) + </li> + </ol> + + <p> + Once installation and configuration are complete, see <xref href="impala_processes.xml#processes"/> for how + to activate the software on the appropriate nodes in your cluster. + </p> + + <p> + If this is your first time setting up and using Impala in this cluster, run through some of the exercises in + <xref href="impala_tutorial.xml#tutorial"/> to verify that you can do basic operations such as creating + tables and querying them. + </p> + </conbody> +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_num_nodes.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_num_nodes.xml b/docs/topics/impala_num_nodes.xml new file mode 100644 index 0000000..8a8fe88 --- /dev/null +++ b/docs/topics/impala_num_nodes.xml @@ -0,0 +1,60 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="num_nodes"> + + <title>NUM_NODES Query Option</title> + <titlealts audience="PDF"><navtitle>NUM_NODES</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Query Options"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Troubleshooting"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">NUM_NODES query option</indexterm> + Limit the number of nodes that process a query, typically during debugging. + </p> + + <p> + <b>Type:</b> numeric + </p> + +<p> + <b>Allowed values:</b> Only accepts the values 0 + (meaning all nodes) or 1 (meaning all work is done on the coordinator node). +</p> + + <p> + <b>Default:</b> 0 + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + If you are diagnosing a problem that you suspect is due to a timing issue due to + distributed query processing, you can set <codeph>NUM_NODES=1</codeph> to verify + if the problem still occurs when all the work is done on a single node. + </p> + + <p conref="../shared/impala_common.xml#common/num_nodes_tip"/> + + <note type="warning" rev="DOCS-1161"> + <p> + Because this option results in increased resource utilization on a single host, + it could cause problems due to contention with other Impala statements or + high resource usage. Symptoms could include queries running slowly, exceeding the memory limit, + or appearing to hang. Use it only in a single-user development/test environment; + <b>do not</b> use it in a production environment or in a cluster with a high-concurrency + or high-volume or performance-critical workload. + </p> + </note> + + + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_num_scanner_threads.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_num_scanner_threads.xml b/docs/topics/impala_num_scanner_threads.xml new file mode 100644 index 0000000..ec609d2 --- /dev/null +++ b/docs/topics/impala_num_scanner_threads.xml @@ -0,0 +1,35 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="num_scanner_threads"> + + <title>NUM_SCANNER_THREADS Query Option</title> + <titlealts audience="PDF"><navtitle>NUM_SCANNER_THREADS</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Query Options"/> + <data name="Category" value="Performance"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">NUM_SCANNER_THREADS query option</indexterm> + Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores + as are available (one thread per core). You might lower this value if queries are using excessive resources + on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect. + </p> + + <p> + <b>Type:</b> numeric + </p> + + <p> + <b>Default:</b> 0 + </p> + + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_odbc.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_odbc.xml b/docs/topics/impala_odbc.xml new file mode 100644 index 0000000..c73915e --- /dev/null +++ b/docs/topics/impala_odbc.xml @@ -0,0 +1,203 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="impala_odbc"> + + <title id="odbc">Configuring Impala to Work with ODBC</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="ODBC"/> + <data name="Category" value="Querying"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">ODBC</indexterm> + Third-party products can be designed to integrate with Impala using ODBC. For the best experience, ensure any + third-party product you intend to use is supported. Verifying support includes checking that the versions of + Impala, ODBC, the operating system, and the third-party product have all been approved for use together. + Before configuring your systems to use ODBC, download a connector. + You may need to sign in and accept license agreements before accessing the pages required for downloading + ODBC connectors. + </p> + + <p outputclass="toc inpage"/> + + </conbody> + + <concept id="odbc_driver"> + <title>Downloading the ODBC Driver</title> + <conbody> + + <note type="important"> + As of late 2015, most business intelligence applications are certified with the 2.x ODBC drivers. + Although the instructions on this page cover both the 2.x and 1.x drivers, expect to use the 2.x drivers + exclusively for most ODBC applications connecting to Impala. + </note> + + <p> + See the + <xref href="http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html?productID=3mDesWddUS" scope="external" format="html">downloads + page</xref> for a matrix of the certified driver version for different products. See the + <xref href="http://www.cloudera.com/content/support/en/documentation/cloudera-connectors-documentation/connector-documentation-latest.html" scope="external" format="html">documentation + page</xref> for installation instructions. + </p> + + </conbody> + </concept> + + <concept id="odbc_port"> + <title>Configuring the ODBC Port</title> + <conbody> + <p> + Versions 2.5 and 2.0 of the Cloudera ODBC Connector, currently certified for some but not all BI applications, + use the HiveServer2 protocol, corresponding to Impala port 21050. Impala supports Kerberos authentication with + all the supported versions of the driver, and requires ODBC 2.05.13 for Impala or higher for LDAP username/password + authentication. + </p> + + <p> + Version 1.x of the Cloudera ODBC Connector uses the original HiveServer1 protocol, corresponding to Impala + port 21000. + </p> + + </conbody> + </concept> + + <concept id="odbc_demo"> + <title>Example of Setting Up an ODBC Application for Impala</title> + <conbody> + + <p> + To illustrate the outline of the setup process, here is a transcript of a session to set up + all required drivers and a business intelligence application + that uses the ODBC driver, under Mac OS X. Each <codeph>.dmg</codeph> file runs a GUI-based + installer, first for the <xref href="http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/Downloads" scope="external" format="html">underlying IODBC driver</xref> needed for non-Windows systems, then for + the Cloudera ODBC Connector, and finally for the BI tool itself. + </p> + +<codeblock>$ ls -1 +Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf +BI_Tool_Installer.dmg +iodbc-sdk-3.52.7-macosx-10.5.dmg +ClouderaImpalaODBC.dmg +$ open iodbc-sdk-3.52.7-macosx-10.dmg +<varname>Install the IODBC driver using its installer</varname> +$ open ClouderaImpalaODBC.dmg +<varname>Install the Cloudera ODBC Connector using its installer</varname> +$ installer_dir=$(pwd) +$ cd /opt/cloudera/impalaodbc +$ ls -1 +Cloudera ODBC Driver for Impala Install Guide.pdf +Readme.txt +Setup +lib +ErrorMessages +Release Notes.txt +Tools +$ cd Setup +$ ls +odbc.ini odbcinst.ini +$ cp odbc.ini ~/.odbc.ini +$ vi ~/.odbc.ini +$ cat ~/.odbc.ini +[ODBC] +# Specify any global ODBC configuration here such as ODBC tracing. + +[ODBC Data Sources] +Sample Cloudera Impala DSN=Cloudera ODBC Driver for Impala + +[Sample Cloudera Impala DSN] + +# Description: DSN Description. +# This key is not necessary and is only to give a description of the data source. +Description=Cloudera ODBC Driver for Impala DSN + +# Driver: The location where the ODBC driver is installed to. +Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib + +# The DriverUnicodeEncoding setting is only used for SimbaDM +# When set to 1, SimbaDM runs in UTF-16 mode. +# When set to 2, SimbaDM runs in UTF-8 mode. +#DriverUnicodeEncoding=2 + +# Values for HOST, PORT, KrbFQDN, and KrbServiceName should be set here. +# They can also be specified on the connection string. +HOST=hostname.sample.example.com +PORT=21050 +Schema=default + +# The authentication mechanism. +# 0 - No authentication (NOSASL) +# 1 - Kerberos authentication (SASL) +# 2 - Username authentication (SASL) +# 3 - Username/password authentication (SASL) +# 4 - Username/password authentication with SSL (SASL) +# 5 - No authentication with SSL (NOSASL) +# 6 - Username/password authentication (NOSASL) +AuthMech=0 + +# Kerberos related settings. +KrbFQDN= +KrbRealm= +KrbServiceName= + +# Username/password authentication with SSL settings. +UID= +PWD +CAIssuedCertNamesMismatch=1 +TrustedCerts=/opt/cloudera/impalaodbc/lib/universal/cacerts.pem + +# Specify the proxy user ID to use. +#DelegationUID= + +# General settings +TSaslTransportBufSize=1000 +RowsFetchedPerBlock=10000 +SocketTimeout=0 +StringColumnLength=32767 +UseNativeQuery=0 +$ pwd +/opt/cloudera/impalaodbc/Setup +$ cd $installer_dir +$ open BI_Tool_Installer.dmg +<varname>Install the BI tool using its installer</varname> +$ ls /Applications | grep BI_Tool +BI_Tool.app +$ open -a BI_Tool.app +<varname>In the BI tool, connect to a data source using port 21050</varname> +</codeblock> + + </conbody> + </concept> + + <concept rev="2.3.0" id="jdbc_odbc_notes"> + <title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title> + <conbody> + + <p> + Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname> interpreter + of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between + the interactive shell and applications using the APIs: + </p> + + <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/> + + <ul> + <li> + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/> + </li> + <li> + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/> + </li> + </ul> + + </conbody> + </concept> + +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_offset.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_offset.xml b/docs/topics/impala_offset.xml new file mode 100644 index 0000000..534f6b6 --- /dev/null +++ b/docs/topics/impala_offset.xml @@ -0,0 +1,66 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept rev="1.2.1" id="offset"> + + <title>OFFSET Clause</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Reports"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + The <codeph>OFFSET</codeph> clause in a <codeph>SELECT</codeph> query causes the result set to start some + number of rows after the logical first item. The result set is numbered starting from zero, so <codeph>OFFSET + 0</codeph> produces the same result as leaving out the <codeph>OFFSET</codeph> clause. Always use this clause + in combination with <codeph>ORDER BY</codeph> (so that it is clear which item should be first, second, and so + on) and <codeph>LIMIT</codeph> (so that the result set covers a bounded range, such as items 0-9, 100-199, + and so on). + </p> + + <p conref="../shared/impala_common.xml#common/limit_and_offset"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following example shows how you could run a <q>paging</q> query originally written for a traditional + database application. Because typical Impala queries process megabytes or gigabytes of data and read large + data files from disk each time, it is inefficient to run a separate query to retrieve each small group of + items. Use this technique only for compatibility while porting older applications, then rewrite the + application code to use a single query with a large result set, and display pages of results from the cached + result set. + </p> + +<codeblock>[localhost:21000] > create table numbers (x int); +[localhost:21000] > insert into numbers select x from very_long_sequence; +Inserted 1000000 rows in 1.34s +[localhost:21000] > select x from numbers order by x limit 5 offset 0; ++----+ +| x | ++----+ +| 1 | +| 2 | +| 3 | +| 4 | +| 5 | ++----+ +[localhost:21000] > select x from numbers order by x limit 5 offset 5; ++----+ +| x | ++----+ +| 6 | +| 7 | +| 8 | +| 9 | +| 10 | ++----+ +</codeblock> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_operators.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_operators.xml b/docs/topics/impala_operators.xml new file mode 100644 index 0000000..acd9297 --- /dev/null +++ b/docs/topics/impala_operators.xml @@ -0,0 +1,1592 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="operators"> + + <title>SQL Operators</title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">operators</indexterm> + SQL operators are a class of comparison functions that are widely used within the <codeph>WHERE</codeph> clauses of + <codeph>SELECT</codeph> statements. + </p> + + <p outputclass="toc inpage"/> + + </conbody> + + <concept rev="1.4.0" id="arithmetic_operators"> + + <title>Arithmetic Operators</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">arithmetic operators</indexterm> + The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>left_hand_arg</varname> <varname>binary_operator</varname> <varname>right_hand_arg</varname> +<varname>unary_operator</varname> <varname>single_arg</varname> +</codeblock> + + <ul> + <li> + <codeph>+</codeph> and <codeph>-</codeph>: Can be used either as unary or binary operators. + <ul> + <li> + <p> + With unary notation, such as <codeph>+5</codeph>, <codeph>-2.5</codeph>, or <codeph>-<varname>col_name</varname></codeph>, + they multiply their single numeric argument by <codeph>+1</codeph> or <codeph>-1</codeph>. Therefore, unary + <codeph>+</codeph> returns its argument unchanged, while unary <codeph>-</codeph> flips the sign of its argument. Although + you can double up these operators in expressions such as <codeph>++5</codeph> (always positive) or <codeph>-+2</codeph> or + <codeph>+-2</codeph> (both always negative), you cannot double the unary minus operator because <codeph>--</codeph> is + interpreted as the start of a comment. (You can use a double unary minus operator if you separate the <codeph>-</codeph> + characters, for example with a space or parentheses.) + </p> + </li> + + <li> + <p> + With binary notation, such as <codeph>2+2</codeph>, <codeph>5-2.5</codeph>, or <codeph><varname>col1</varname> + + <varname>col2</varname></codeph>, they add or subtract respectively the right-hand argument to (or from) the left-hand + argument. Both arguments must be of numeric types. + </p> + </li> + </ul> + </li> + + <li> + <p> + <codeph>*</codeph> and <codeph>/</codeph>: Multiplication and division respectively. Both arguments must be of numeric types. + </p> + + <p> + When multiplying, the shorter argument is promoted if necessary (such as <codeph>SMALLINT</codeph> to <codeph>INT</codeph> or + <codeph>BIGINT</codeph>, or <codeph>FLOAT</codeph> to <codeph>DOUBLE</codeph>), and then the result is promoted again to the + next larger type. Thus, multiplying a <codeph>TINYINT</codeph> and an <codeph>INT</codeph> produces a <codeph>BIGINT</codeph> + result. Multiplying a <codeph>FLOAT</codeph> and a <codeph>FLOAT</codeph> produces a <codeph>DOUBLE</codeph> result. Multiplying + a <codeph>FLOAT</codeph> and a <codeph>DOUBLE</codeph> or a <codeph>DOUBLE</codeph> and a <codeph>DOUBLE</codeph> produces a + <codeph>DECIMAL(38,17)</codeph>, because <codeph>DECIMAL</codeph> values can represent much larger and more precise values than + <codeph>DOUBLE</codeph>. + </p> + + <p> + When dividing, Impala always treats the arguments and result as <codeph>DOUBLE</codeph> values to avoid losing precision. If you + need to insert the results of a division operation into a <codeph>FLOAT</codeph> column, use the <codeph>CAST()</codeph> + function to convert the result to the correct type. + </p> + </li> + + <li> + <p> + <codeph>%</codeph>: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both + arguments must be of one of the integer types. + </p> + </li> + + <li> + <p> + <codeph>&</codeph>, <codeph>|</codeph>, <codeph>~</codeph>, and <codeph>^</codeph>: Bitwise operators that return the + logical AND, logical OR, <codeph>NOT</codeph>, or logical XOR (exclusive OR) of their argument values. Both arguments must be of + one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to + match the argument with the longer type. + </p> + </li> + </ul> + + <p> + You can chain a sequence of arithmetic expressions, optionally grouping them with parentheses. + </p> + + <p> + The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to + Impala 2.2.0 / CDH 5.4.0, there is no <codeph>MOD()</codeph> function equivalent to the <codeph>%</codeph> modulo operator. + Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use + the <codeph>POW()</codeph> function, but there is no <codeph>**</codeph> exponentiation operator. See + <xref href="impala_math_functions.xml#math_functions"/> for the arithmetic functions you can use. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + + <p rev="2.3.0"> + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: + </p> + +<codeblock rev="2.3.0"> +-- The SMALLINT is a field within an array of structs. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- When we refer to the scalar value using dot notation, +-- we can use arithmetic and comparison operators on it +-- like any other number. +select r_name, nation.item.n_name, nation.item.n_nationkey * 10 + from region, region.r_nations as nation +where nation.item.n_nationkey < 5; ++-------------+-------------+------------------------------+ +| r_name | item.n_name | nation.item.n_nationkey * 10 | ++-------------+-------------+------------------------------+ +| AMERICA | CANADA | 30 | +| AMERICA | BRAZIL | 20 | +| AMERICA | ARGENTINA | 10 | +| MIDDLE EAST | EGYPT | 40 | +| AFRICA | ALGERIA | 0 | ++-------------+-------------+------------------------------+ +</codeblock> + + </conbody> + + </concept> + + <concept id="between"> + + <title>BETWEEN Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">BETWEEN operator</indexterm> + In a <codeph>WHERE</codeph> clause, compares an expression to both a lower and upper bound. The comparison is successful is the + expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched, + so the lower bound is greater than the upper bound, does not match any values. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>expression</varname> BETWEEN <varname>lower_bound</varname> AND <varname>upper_bound</varname></codeblock> + + <p> + <b>Data types:</b> Typically used with numeric data types. Works with any data type, although not very practical for + <codeph>BOOLEAN</codeph> values. (<codeph>BETWEEN false AND true</codeph> will match all <codeph>BOOLEAN</codeph> values.) Use + <codeph>CAST()</codeph> if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time + functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a + number. + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it + is considered greater than the upper bound. For example, <codeph>BETWEEN 'A' and 'M'</codeph> would not match the string value + <codeph>'Midway'</codeph>. Use functions such as <codeph>upper()</codeph>, <codeph>lower()</codeph>, <codeph>substr()</codeph>, + <codeph>trim()</codeph>, and so on if necessary to ensure the comparison works as expected. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- Retrieve data for January through June, inclusive. +select c1 from t1 where month <b>between 1 and 6</b>; + +-- Retrieve data for names beginning with 'A' through 'M' inclusive. +-- Only test the first letter to ensure all the values starting with 'M' are matched. +-- Do a case-insensitive comparison to match names with various capitalization conventions. +select last_name from customers where upper(substr(last_name,1,1)) <b>between 'A' and 'M'</b>; + +-- Retrieve data for only the first week of each month. +select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed) <b>between 1 and 7</b>;</codeblock> + + <p rev="2.3.0"> + The following example shows how to do a <codeph>BETWEEN</codeph> comparison using a numeric field of a <codeph>STRUCT</codeph> type + that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it + can be used in a comparison operator: + </p> + +<codeblock rev="2.3.0"> +-- The SMALLINT is a field within an array of structs. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- When we refer to the scalar value using dot notation, +-- we can use arithmetic and comparison operators on it +-- like any other number. +select r_name, nation.item.n_name, nation.item.n_nationkey +from region, region.r_nations as nation +where nation.item.n_nationkey between 3 and 5 ++-------------+-------------+------------------+ +| r_name | item.n_name | item.n_nationkey | ++-------------+-------------+------------------+ +| AMERICA | CANADA | 3 | +| MIDDLE EAST | EGYPT | 4 | +| AFRICA | ETHIOPIA | 5 | ++-------------+-------------+------------------+ +</codeblock> + + </conbody> + + </concept> + + <concept id="comparison_operators"> + + <title>Comparison Operators</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">comparison operators</indexterm> + Impala supports the familiar comparison operators for checking equality and sort order for the column data types: + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>left_hand_expression</varname> <varname>comparison_operator</varname> <varname>right_hand_expression</varname></codeblock> + + <ul> + <li> + <codeph>=</codeph>, <codeph>!=</codeph>, <codeph><></codeph>: apply to all types. + </li> + + <li> + <codeph><</codeph>, <codeph><=</codeph>, <codeph>></codeph>, <codeph>>=</codeph>: apply to all types; for + <codeph>BOOLEAN</codeph>, <codeph>TRUE</codeph> is considered greater than <codeph>FALSE</codeph>. + </li> + </ul> + + <p> + <b>Alternatives:</b> + </p> + + <p> + The <codeph>IN</codeph> and <codeph>BETWEEN</codeph> operators provide shorthand notation for expressing combinations of equality, + less than, and greater than comparisons with a single operator. + </p> + + <p> + Because comparing any value to <codeph>NULL</codeph> produces <codeph>NULL</codeph> rather than <codeph>TRUE</codeph> or + <codeph>FALSE</codeph>, use the <codeph>IS NULL</codeph> and <codeph>IS NOT NULL</codeph> operators to check if a value is + <codeph>NULL</codeph> or not. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + + <p rev="2.3.0"> + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used with a comparison operator such as <codeph><</codeph>: + </p> + +<codeblock rev="2.3.0"> +-- The SMALLINT is a field within an array of structs. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- When we refer to the scalar value using dot notation, +-- we can use arithmetic and comparison operators on it +-- like any other number. +select r_name, nation.item.n_name, nation.item.n_nationkey +from region, region.r_nations as nation +where nation.item.n_nationkey < 5 ++-------------+-------------+------------------+ +| r_name | item.n_name | item.n_nationkey | ++-------------+-------------+------------------+ +| AMERICA | CANADA | 3 | +| AMERICA | BRAZIL | 2 | +| AMERICA | ARGENTINA | 1 | +| MIDDLE EAST | EGYPT | 4 | +| AFRICA | ALGERIA | 0 | ++-------------+-------------+------------------+ +</codeblock> + + </conbody> + + </concept> + + <concept audience="Cloudera" rev="2.1.0" id="except"> + + <title>EXCEPT Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">EXCEPT operator</indexterm> + </p> + + </conbody> + + </concept> + + <concept rev="2.0.0" id="exists"> + + <title>EXISTS Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">EXISTS operator</indexterm> + + <indexterm audience="Cloudera">NOT EXISTS operator</indexterm> + The <codeph>EXISTS</codeph> operator tests whether a subquery returns any results. You typically use it to find values from one + table that have corresponding values in another table. + </p> + + <p> + The converse, <codeph>NOT EXISTS</codeph>, helps to find all the values from one table that do not have any corresponding values in + another table. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock>EXISTS (<varname>subquery</varname>) +NOT EXISTS (<varname>subquery</varname>) +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + The subquery can refer to a different table than the outer query block, or the same table. For example, you might use + <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> to check the existence of parent/child relationships between two columns of + the same table. + </p> + + <p> + You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict + equality. For example, you might use a call to <codeph>COUNT()</codeph> in the subquery to check whether the number of matching + values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a + hashed representation of those same values in a different table. + </p> + + <p conref="../shared/impala_common.xml#common/null_blurb"/> + + <p> + If the subquery returns any value at all (even <codeph>NULL</codeph>), <codeph>EXISTS</codeph> returns <codeph>TRUE</codeph> and + <codeph>NOT EXISTS</codeph> returns false. + </p> + + <p> + The following example shows how even when the subquery returns only <codeph>NULL</codeph> values, <codeph>EXISTS</codeph> still + returns <codeph>TRUE</codeph> and thus matches all the rows from the table in the outer query block. + </p> + +<codeblock>[localhost:21000] > create table all_nulls (x int); +[localhost:21000] > insert into all_nulls values (null), (null), (null); +[localhost:21000] > select y from t2 where exists (select x from all_nulls); ++---+ +| y | ++---+ +| 2 | +| 4 | +| 6 | ++---+ +</codeblock> + + <p> + However, if the table in the subquery is empty and so the subquery returns an empty result set, <codeph>EXISTS</codeph> returns + <codeph>FALSE</codeph>: + </p> + +<codeblock>[localhost:21000] > create table empty (x int); +[localhost:21000] > select y from t2 where exists (select x from empty); +[localhost:21000] > +</codeblock> + + <p conref="../shared/impala_common.xml#common/added_in_20"/> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <p conref="../shared/impala_common.xml#common/subquery_no_limit"/> + + <p rev="IMPALA-3232"> + Prior to <keyword keyref="impala26_full"/>, + the <codeph>NOT EXISTS</codeph> operator required a correlated subquery. + In <keyword keyref="impala26_full"/> and higher, <codeph>NOT EXISTS</codeph> works with + uncorrelated queries also. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct an EXISTS / NOT EXISTS example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> +<!-- Maybe turn this into a conref if the same set of tables gets used for subqueries, EXISTS, other places. --> +<!-- Yes, the material was reused under Subqueries for anti-joins. --> + The following examples refer to these simple tables containing small sets of integers or strings: +<codeblock>[localhost:21000] > create table t1 (x int); +[localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6); + +[localhost:21000] > create table t2 (y int); +[localhost:21000] > insert into t2 values (2), (4), (6); + +[localhost:21000] > create table t3 (z int); +[localhost:21000] > insert into t3 values (1), (3), (5); + +[localhost:21000] > create table month_names (m string); +[localhost:21000] > insert into month_names values + > ('January'), ('February'), ('March'), + > ('April'), ('May'), ('June'), ('July'), + > ('August'), ('September'), ('October'), + > ('November'), ('December'); +</codeblock> + </p> + + <p> + The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value + <codeph>X</codeph> from <codeph>T1</codeph>, the query checks if the <codeph>Y</codeph> column of <codeph>T2</codeph> contains an + identical value, and the <codeph>EXISTS</codeph> operator returns <codeph>TRUE</codeph> or <codeph>FALSE</codeph> as appropriate in + each case. + </p> + +<codeblock>localhost:21000] > select x from t1 where exists (select y from t2 where t1.x = y); ++---+ +| x | ++---+ +| 2 | +| 4 | +| 6 | ++---+ +</codeblock> + + <p> + An uncorrelated query is less interesting in this case. Because the subquery always returns <codeph>TRUE</codeph>, all rows from + <codeph>T1</codeph> are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows + from <codeph>T1</codeph> would be returned. + </p> + +<codeblock>[localhost:21000] > select x from t1 where exists (select y from t2 where y > 5); ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | +| 4 | +| 5 | +| 6 | ++---+ +</codeblock> + + <p> + The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using + <codeph>LIMIT 1</codeph> or an aggregate function, the query returns a single result or no result based on whether the subquery + matches any rows. Here, we know that <codeph>T1</codeph> and <codeph>T2</codeph> contain some even numbers, but <codeph>T3</codeph> + does not. + </p> + +<codeblock>[localhost:21000] > select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1; ++---------------------------+ +| 'contains an even number' | ++---------------------------+ +| contains an even number | ++---------------------------+ +[localhost:21000] > select "contains an even number" as assertion from t1 where exists (select x from t1 where x % 2 = 0) limit 1; ++-------------------------+ +| assertion | ++-------------------------+ +| contains an even number | ++-------------------------+ +[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select x from t2 where y % 2 = 0) limit 1; +ERROR: AnalysisException: couldn't resolve column reference: 'x' +[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select y from t2 where y % 2 = 0) limit 1; ++-------------------------+ +| assertion | ++-------------------------+ +| contains an even number | ++-------------------------+ +[localhost:21000] > select "contains an even number" as assertion from t3 where exists (select z from t3 where z % 2 = 0) limit 1; +[localhost:21000] > +</codeblock> + + <p> + The following example finds numbers in one table that are 1 greater than numbers from another table. The <codeph>EXISTS</codeph> + notation is simpler than an equivalent <codeph>CROSS JOIN</codeph> between the tables. (The example then also illustrates how the + same test could be performed using an <codeph>IN</codeph> operator.) + </p> + +<codeblock>[localhost:21000] > select x from t1 where exists (select y from t2 where x = y + 1); ++---+ +| x | ++---+ +| 3 | +| 5 | ++---+ +[localhost:21000] > select x from t1 where x in (select y + 1 from t2); ++---+ +| x | ++---+ +| 3 | +| 5 | ++---+ +</codeblock> + + <p> + The following example finds values from one table that do not exist in another table. + </p> + +<codeblock>[localhost:21000] > select x from t1 where not exists (select y from t2 where x = y); ++---+ +| x | ++---+ +| 1 | +| 3 | +| 5 | ++---+ +</codeblock> + + <p> + The following example uses the <codeph>NOT EXISTS</codeph> operator to find all the leaf nodes in tree-structured data. This + simplified <q>tree of life</q> has multiple levels (class, order, family, and so on), with each item pointing upward through a + <codeph>PARENT</codeph> pointer. The example runs an outer query and a subquery on the same table, returning only those items whose + <codeph>ID</codeph> value is <i>not</i> referenced by the <codeph>PARENT</codeph> of any other item. + </p> + +<codeblock>[localhost:21000] > create table tree (id int, parent int, name string); +[localhost:21000] > insert overwrite tree values + > (0, null, "animals"), + > (1, 0, "placentals"), + > (2, 0, "marsupials"), + > (3, 1, "bats"), + > (4, 1, "cats"), + > (5, 2, "kangaroos"), + > (6, 4, "lions"), + > (7, 4, "tigers"), + > (8, 5, "red kangaroo"), + > (9, 2, "wallabies"); +[localhost:21000] > select name as "leaf node" from tree one + > where not exists (select parent from tree two where one.id = two.parent); ++--------------+ +| leaf node | ++--------------+ +| bats | +| lions | +| tigers | +| red kangaroo | +| wallabies | ++--------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_subqueries.xml#subqueries"/> + </p> + + </conbody> + + </concept> + + <concept rev="2.4.0" id="ilike"> + + <title>ILIKE Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">ILIKE operator</indexterm> + A case-insensitive comparison operator for <codeph>STRING</codeph> data, with basic wildcard capability using <codeph>_</codeph> to match a single + character and <codeph>%</codeph> to match multiple characters. The argument expression must match the entire string value. + Typically, it is more efficient to put any <codeph>%</codeph> wildcard match at the end of the string. + </p> + + <p> + This operator, available in <keyword keyref="impala25_full"/> and higher, is the equivalent of the <codeph>LIKE</codeph> operator, + but with case-insensitive comparisons. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> ILIKE <varname>wildcard_expression</varname> +<varname>string_expression</varname> NOT ILIKE <varname>wildcard_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a LIKE example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + In the following examples, strings that are the same except for differences in uppercase + and lowercase match successfully with <codeph>ILIKE</codeph>, but do not match + with <codeph>LIKE</codeph>: + </p> + +<codeblock>select 'fooBar' ilike 'FOOBAR'; ++-------------------------+ +| 'foobar' ilike 'foobar' | ++-------------------------+ +| true | ++-------------------------+ + +select 'fooBar' like 'FOOBAR'; ++------------------------+ +| 'foobar' like 'foobar' | ++------------------------+ +| false | ++------------------------+ + +select 'FOOBAR' ilike 'f%'; ++---------------------+ +| 'foobar' ilike 'f%' | ++---------------------+ +| true | ++---------------------+ + +select 'FOOBAR' like 'f%'; ++--------------------+ +| 'foobar' like 'f%' | ++--------------------+ +| false | ++--------------------+ + +select 'ABCXYZ' not ilike 'ab_xyz'; ++-----------------------------+ +| not 'abcxyz' ilike 'ab_xyz' | ++-----------------------------+ +| false | ++-----------------------------+ + +select 'ABCXYZ' not like 'ab_xyz'; ++----------------------------+ +| not 'abcxyz' like 'ab_xyz' | ++----------------------------+ +| true | ++----------------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p rev="2.5.0"> + For case-sensitive comparisons, see <xref href="impala_operators.xml#like"/>. + For a more general kind of search operator using regular expressions, see <xref href="impala_operators.xml#regexp"/> + or its case-insensitive counterpart <xref href="impala_operators.xml#iregexp"/>. + </p> + + </conbody> + + </concept> + + <concept id="in"> + + <title>IN Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IN operator</indexterm> + + <indexterm audience="Cloudera">NOT IN operator</indexterm> + The <codeph>IN</codeph> operator compares an argument value to a set of values, and returns <codeph>TRUE</codeph> if the argument + matches any value in the set. The <codeph>NOT IN</codeph> operator reverses the comparison, and checks if the argument value is not + part of a set of values. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock rev="2.0.0"><varname>expression</varname> IN (<varname>expression</varname> [, <varname>expression</varname>]) +<varname>expression</varname> IN (<varname>subquery</varname>) + +<varname>expression</varname> NOT IN (<varname>expression</varname> [, <varname>expression</varname>]) +<varname>expression</varname> NOT IN (<varname>subquery</varname>) +</codeblock> + + <p> + The left-hand expression and the set of comparison values must be of compatible types. + </p> + + <p> + The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column + name, it could also be some other value. For example, the <codeph>WHERE</codeph> clauses <codeph>WHERE id IN (5)</codeph> and + <codeph>WHERE 5 IN (id)</codeph> produce the same results. + </p> + + <p rev="CDH-40407"> + The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query + text. The maximum number of expressions in the <codeph>IN</codeph> list is 9999. (The maximum number of elements of + a single expression is 10,000 items, and the <codeph>IN</codeph> operator itself counts as one.) + </p> + + <p rev="2.0.0"> + In Impala 2.0 and higher, the set of values can also be generated by a subquery. <codeph>IN</codeph> can evaluate an unlimited + number of results using a subquery. + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + Any expression using the <codeph>IN</codeph> operator could be rewritten as a series of equality tests connected with + <codeph>OR</codeph>, but the <codeph>IN</codeph> syntax is often clearer, more concise, and easier for Impala to optimize. For + example, with partitioned tables, queries frequently use <codeph>IN</codeph> clauses to filter data by comparing the partition key + columns to specific values. + </p> + + <p conref="../shared/impala_common.xml#common/null_blurb"/> + + <p> + If there really is a matching non-null value, <codeph>IN</codeph> returns <codeph>TRUE</codeph>: + </p> + +<codeblock>[localhost:21000] > select 1 in (1,null,2,3); ++----------------------+ +| 1 in (1, null, 2, 3) | ++----------------------+ +| true | ++----------------------+ +[localhost:21000] > select 1 not in (1,null,2,3); ++--------------------------+ +| 1 not in (1, null, 2, 3) | ++--------------------------+ +| false | ++--------------------------+ +</codeblock> + + <p> + If the searched value is not found in the comparison values, and the comparison values include <codeph>NULL</codeph>, the result is + <codeph>NULL</codeph>: + </p> + +<codeblock>[localhost:21000] > select 5 in (1,null,2,3); ++----------------------+ +| 5 in (1, null, 2, 3) | ++----------------------+ +| NULL | ++----------------------+ +[localhost:21000] > select 5 not in (1,null,2,3); ++--------------------------+ +| 5 not in (1, null, 2, 3) | ++--------------------------+ +| NULL | ++--------------------------+ +[localhost:21000] > select 1 in (null); ++-------------+ +| 1 in (null) | ++-------------+ +| NULL | ++-------------+ +[localhost:21000] > select 1 not in (null); ++-----------------+ +| 1 not in (null) | ++-----------------+ +| NULL | ++-----------------+ +</codeblock> + + <p> + If the left-hand argument is <codeph>NULL</codeph>, <codeph>IN</codeph> always returns <codeph>NULL</codeph>. This rule applies even + if the comparison values include <codeph>NULL</codeph>. + </p> + +<codeblock>[localhost:21000] > select null in (1,2,3); ++-------------------+ +| null in (1, 2, 3) | ++-------------------+ +| NULL | ++-------------------+ +[localhost:21000] > select null not in (1,2,3); ++-----------------------+ +| null not in (1, 2, 3) | ++-----------------------+ +| NULL | ++-----------------------+ +[localhost:21000] > select null in (null); ++----------------+ +| null in (null) | ++----------------+ +| NULL | ++----------------+ +[localhost:21000] > select null not in (null); ++--------------------+ +| null not in (null) | ++--------------------+ +| NULL | ++--------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/enhanced_in_20"/> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + + <p rev="2.3.0"> + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: + </p> + +<codeblock rev="2.3.0"> +-- The SMALLINT is a field within an array of structs. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- When we refer to the scalar value using dot notation, +-- we can use arithmetic and comparison operators on it +-- like any other number. +select r_name, nation.item.n_name, nation.item.n_nationkey +from region, region.r_nations as nation +where nation.item.n_nationkey in (1,3,5) ++---------+-------------+------------------+ +| r_name | item.n_name | item.n_nationkey | ++---------+-------------+------------------+ +| AMERICA | CANADA | 3 | +| AMERICA | ARGENTINA | 1 | +| AFRICA | ETHIOPIA | 5 | ++---------+-------------+------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <p conref="../shared/impala_common.xml#common/subquery_no_limit"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- Using IN is concise and self-documenting. +SELECT * FROM t1 WHERE c1 IN (1,2,10); +-- Equivalent to series of = comparisons ORed together. +SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10; + +SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN ('A','E','I','O','U'); + +SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','June','July');</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_subqueries.xml#subqueries"/> + </p> + + </conbody> + + </concept> + + <concept audience="Cloudera" rev="2.1.0" id="intersect"> + + <title>INTERSECT Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">INTERSECT operator</indexterm> + </p> + + </conbody> + + </concept> + + <concept rev="2.5.0" id="iregexp"> + + <title>IREGEXP Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IREGEXP operator</indexterm> + Tests whether a value matches a regular expression, using case-insensitive string comparisons. + Uses the POSIX regular expression syntax where <codeph>^</codeph> and + <codeph>$</codeph> match the beginning and end of the string, <codeph>.</codeph> represents any single character, <codeph>*</codeph> + represents a sequence of zero or more items, <codeph>+</codeph> represents a sequence of one or more items, <codeph>?</codeph> + produces a non-greedy match, and so on. + </p> + + <p> + This operator, available in <keyword keyref="impala25_full"/> and higher, is the equivalent of the <codeph>REGEXP</codeph> operator, + but with case-insensitive comparisons. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> IREGEXP <varname>regular_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/regular_expression_whole_string"/> + +<!-- Currently, there isn't any IRLIKE synonym, so REGEXP and IREGEXP are different in that respect. + I pinged IMPALA-1787 to check if that's intentional. + <p> + The <codeph>IRLIKE</codeph> operator is a synonym for <codeph>IREGEXP</codeph>. + </p> +--> + + <p rev="2.5.0"> + The <codeph>|</codeph> symbol is the alternation operator, typically used within <codeph>()</codeph> to match different sequences. + The <codeph>()</codeph> groups do not allow backreferences. To retrieve the part of a value matched within a <codeph>()</codeph> + section, use the <codeph><xref href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph> + built-in function. (Currently, there is not any case-insensitive equivalent for the <codeph>regexp_extract()</codeph> function.) + </p> + + <note rev="1.3.1"> + <p rev="1.3.1" conref="../shared/impala_common.xml#common/regexp_matching"/> + </note> + + <p conref="../shared/impala_common.xml#common/regexp_re2"/> + + <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a REGEXP example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples demonstrate the syntax for the <codeph>IREGEXP</codeph> operator. + </p> + +<codeblock>select 'abcABCaabbcc' iregexp '^[a-c]+$'; ++---------------------------------+ +| 'abcabcaabbcc' iregexp '[a-c]+' | ++---------------------------------+ +| true | ++---------------------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_operators.xml#regexp"/> + </p> + + </conbody> + + </concept> + + <concept rev="2.5.0 IMPALA-2147" id="is_distinct_from"> + + <title id="is_distinct">IS DISTINCT FROM Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IS DISTINCT FROM operator</indexterm> + + <indexterm audience="Cloudera">IS NOT DISTINCT FROM operator</indexterm> + The <codeph>IS DISTINCT FROM</codeph> operator, and its converse the <codeph>IS NOT DISTINCT FROM</codeph> operator, test whether or + not values are identical. <codeph>IS NOT DISTINCT FROM</codeph> is similar to the <codeph>=</codeph> operator, and <codeph>IS + DISTINCT FROM</codeph> is similar to the <codeph>!=</codeph> operator, except that <codeph>NULL</codeph> values are treated as + identical. Therefore, <codeph>IS NOT DISTINCT FROM</codeph> returns <codeph>true</codeph> rather than <codeph>NULL</codeph>, and + <codeph>IS DISTINCT FROM</codeph> returns <codeph>false</codeph> rather than <codeph>NULL</codeph>, when comparing two + <codeph>NULL</codeph> values. If one of the values being compared is <codeph>NULL</codeph> and the other is not, <codeph>IS DISTINCT + FROM</codeph> returns <codeph>true</codeph> and <codeph>IS NOT DISTINCT FROM</codeph> returns <codeph>false</codeph>, again instead + of returning <codeph>NULL</codeph> in both cases. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>expression1</varname> IS DISTINCT FROM <varname>expression2</varname> + +<varname>expression1</varname> IS NOT DISTINCT FROM <varname>expression2</varname> +<varname>expression1</varname> <=> <varname>expression2</varname> +</codeblock> + + <p> + The operator <codeph><=></codeph> is an alias for <codeph>IS NOT DISTINCT FROM</codeph>. + It is typically used as a <codeph>NULL</codeph>-safe equality operator in join queries. + That is, <codeph>A <=> B</codeph> is true if <codeph>A</codeph> equals <codeph>B</codeph> + or if both <codeph>A</codeph> and <codeph>B</codeph> are <codeph>NULL</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + This operator provides concise notation for comparing two values and always producing a <codeph>true</codeph> or + <codeph>false</codeph> result, without treating <codeph>NULL</codeph> as a special case. Otherwise, to unambiguously distinguish + between two values requires a compound expression involving <codeph>IS [NOT] NULL</codeph> tests of both operands in addition to the + <codeph>=</codeph> or <codeph>!=</codeph> operator. + </p> + + <p> + The <codeph><=></codeph> operator, used like an equality operator in a join query, + is more efficient than the equivalent clause: <codeph>A = B OR (A IS NULL AND B IS NULL)</codeph>. + The <codeph><=></codeph> operator can use a hash join, while the <codeph>OR</codeph> expression + cannot. + </p> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples show how <codeph>IS DISTINCT FROM</codeph> gives output similar to + the <codeph>!=</codeph> operator, and <codeph>IS NOT DISTINCT FROM</codeph> gives output + similar to the <codeph>=</codeph> operator. The exception is when the expression involves + a <codeph>NULL</codeph> value on one side or both sides, where <codeph>!=</codeph> and + <codeph>=</codeph> return <codeph>NULL</codeph> but the <codeph>IS [NOT] DISTINCT FROM</codeph> + operators still return <codeph>true</codeph> or <codeph>false</codeph>. + </p> + +<codeblock> +select 1 is distinct from 0, 1 != 0; ++----------------------+--------+ +| 1 is distinct from 0 | 1 != 0 | ++----------------------+--------+ +| true | true | ++----------------------+--------+ + +select 1 is distinct from 1, 1 != 1; ++----------------------+--------+ +| 1 is distinct from 1 | 1 != 1 | ++----------------------+--------+ +| false | false | ++----------------------+--------+ + +select 1 is distinct from null, 1 != null; ++-------------------------+-----------+ +| 1 is distinct from null | 1 != null | ++-------------------------+-----------+ +| true | NULL | ++-------------------------+-----------+ + +select null is distinct from null, null != null; ++----------------------------+--------------+ +| null is distinct from null | null != null | ++----------------------------+--------------+ +| false | NULL | ++----------------------------+--------------+ + +select 1 is not distinct from 0, 1 = 0; ++--------------------------+-------+ +| 1 is not distinct from 0 | 1 = 0 | ++--------------------------+-------+ +| false | false | ++--------------------------+-------+ + +select 1 is not distinct from 1, 1 = 1; ++--------------------------+-------+ +| 1 is not distinct from 1 | 1 = 1 | ++--------------------------+-------+ +| true | true | ++--------------------------+-------+ + +select 1 is not distinct from null, 1 = null; ++-----------------------------+----------+ +| 1 is not distinct from null | 1 = null | ++-----------------------------+----------+ +| false | NULL | ++-----------------------------+----------+ + +select null is not distinct from null, null = null; ++--------------------------------+-------------+ +| null is not distinct from null | null = null | ++--------------------------------+-------------+ +| true | NULL | ++--------------------------------+-------------+ +</codeblock> + + <p> + The following example shows how <codeph>IS DISTINCT FROM</codeph> considers + <codeph>CHAR</codeph> values to be the same (not distinct from each other) + if they only differ in the number of trailing spaces. Therefore, sometimes + the result of an <codeph>IS [NOT] DISTINCT FROM</codeph> operator differs + depending on whether the values are <codeph>STRING</codeph>/<codeph>VARCHAR</codeph> + or <codeph>CHAR</codeph>. + </p> + +<codeblock> +select + 'x' is distinct from 'x ' as string_with_trailing_spaces, + cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces; ++-----------------------------+---------------------------+ +| string_with_trailing_spaces | char_with_trailing_spaces | ++-----------------------------+---------------------------+ +| true | false | ++-----------------------------+---------------------------+ +</codeblock> + </conbody> + + </concept> + + <concept id="is_null"> + + <title>IS NULL Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IS NULL operator</indexterm> + + <indexterm audience="Cloudera">IS NOT NULL operator</indexterm> + The <codeph>IS NULL</codeph> operator, and its converse the <codeph>IS NOT NULL</codeph> operator, test whether a specified value is + <codeph><xref href="impala_literals.xml#null">NULL</xref></codeph>. Because using <codeph>NULL</codeph> with any of the other + comparison operators such as <codeph>=</codeph> or <codeph>!=</codeph> also returns <codeph>NULL</codeph> rather than + <codeph>TRUE</codeph> or <codeph>FALSE</codeph>, you use a special-purpose comparison operator to check for this special condition. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>expression</varname> IS NULL +<varname>expression</varname> IS NOT NULL +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + In many cases, <codeph>NULL</codeph> values indicate some incorrect or incomplete processing during data ingestion or conversion. + You might check whether any values in a column are <codeph>NULL</codeph>, and if so take some followup action to fill them in. + </p> + + <p> + With sparse data, often represented in <q>wide</q> tables, it is common for most values to be <codeph>NULL</codeph> with only an + occasional non-<codeph>NULL</codeph> value. In those cases, you can use the <codeph>IS NOT NULL</codeph> operator to identify the + rows containing any data at all for a particular column, regardless of the actual value. + </p> + + <p> + With a well-designed database schema, effective use of <codeph>NULL</codeph> values and <codeph>IS NULL</codeph> and <codeph>IS NOT + NULL</codeph> operators can save having to design custom logic around special values such as 0, -1, <codeph>'N/A'</codeph>, empty + string, and so on. <codeph>NULL</codeph> lets you distinguish between a value that is known to be 0, false, or empty, and a truly + unknown value. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p rev="2.3.0"> + This operator is not applicable to complex type columns (<codeph>STRUCT</codeph>, + <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>). Using a complex type column + with <codeph>IS NULL</codeph> or <codeph>IS NOT NULL</codeph> causes a query error. + </p> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- If this value is non-zero, something is wrong. +select count(*) from employees where employee_id is null; + +-- With data from disparate sources, some fields might be blank. +-- Not necessarily an error condition. +select count(*) from census where household_income is null; + +-- Sometimes we expect fields to be null, and followup action +-- is needed when they are not. +select count(*) from web_traffic where weird_http_code is not null;</codeblock> + + </conbody> + + </concept> + + <concept id="like"> + + <title>LIKE Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">LIKE operator</indexterm> + A comparison operator for <codeph>STRING</codeph> data, with basic wildcard capability using the underscore + (<codeph>_</codeph>) to match a single character and the percent sign (<codeph>%</codeph>) to match multiple + characters. The argument expression must match the entire string value. + Typically, it is more efficient to put any <codeph>%</codeph> wildcard match at the end of the string. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> LIKE <varname>wildcard_expression</varname> +<varname>string_expression</varname> NOT LIKE <varname>wildcard_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a LIKE example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>select distinct c_last_name from customer where c_last_name like 'Mc%' or c_last_name like 'Mac%'; +select count(c_last_name) from customer where c_last_name like 'M%'; +select c_email_address from customer where c_email_address like '%.edu'; + +-- We can find 4-letter names beginning with 'M' by calling functions... +select distinct c_last_name from customer where length(c_last_name) = 4 and substr(c_last_name,1,1) = 'M'; +-- ...or in a more readable way by matching M followed by exactly 3 characters. +select distinct c_last_name from customer where c_last_name like 'M___';</codeblock> + + <p rev="2.5.0"> + For case-insensitive comparisons, see <xref href="impala_operators.xml#ilike"/>. + For a more general kind of search operator using regular expressions, see <xref href="impala_operators.xml#regexp"/> + or its case-insensitive counterpart <xref href="impala_operators.xml#iregexp"/>. + </p> + + </conbody> + + </concept> + + <concept id="logical_operators"> + + <title>Logical Operators</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">logical operators</indexterm> + Logical operators return a <codeph>BOOLEAN</codeph> value, based on a binary or unary logical operation between arguments that are + also Booleans. Typically, the argument expressions use <xref href="impala_operators.xml#comparison_operators">comparison + operators</xref>. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>boolean_expression</varname> <varname>binary_logical_operator</varname> <varname>boolean_expression</varname> +<varname>unary_logical_operator</varname> <varname>boolean_expression</varname> +</codeblock> + + <p> + The Impala logical operators are: + </p> + + <ul> + <li> + <codeph>AND</codeph>: A binary operator that returns <codeph>true</codeph> if its left-hand and right-hand arguments both evaluate + to <codeph>true</codeph>, <codeph>NULL</codeph> if either argument is <codeph>NULL</codeph>, and <codeph>false</codeph> otherwise. + </li> + + <li> + <codeph>OR</codeph>: A binary operator that returns <codeph>true</codeph> if either of its left-hand and right-hand arguments + evaluate to <codeph>true</codeph>, <codeph>NULL</codeph> if one argument is <codeph>NULL</codeph> and the other is either + <codeph>NULL</codeph> or <codeph>false</codeph>, and <codeph>false</codeph> otherwise. + </li> + + <li> + <codeph>NOT</codeph>: A unary operator that flips the state of a Boolean expression from <codeph>true</codeph> to + <codeph>false</codeph>, or <codeph>false</codeph> to <codeph>true</codeph>. If the argument expression is <codeph>NULL</codeph>, + the result remains <codeph>NULL</codeph>. (When <codeph>NOT</codeph> is used this way as a unary logical operator, it works + differently than the <codeph>IS NOT NULL</codeph> comparison operator, which returns <codeph>true</codeph> when applied to a + <codeph>NULL</codeph>.) + </li> + </ul> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + + <p rev="2.3.0"> + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: + </p> + +<codeblock rev="2.3.0"> +-- The SMALLINT is a field within an array of structs. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- When we refer to the scalar value using dot notation, +-- we can use arithmetic and comparison operators on it +-- like any other number. +select r_name, nation.item.n_name, nation.item.n_nationkey + from region, region.r_nations as nation +where + nation.item.n_nationkey between 3 and 5 + or nation.item.n_nationkey < 15; ++-------------+----------------+------------------+ +| r_name | item.n_name | item.n_nationkey | ++-------------+----------------+------------------+ +| EUROPE | UNITED KINGDOM | 23 | +| EUROPE | RUSSIA | 22 | +| EUROPE | ROMANIA | 19 | +| ASIA | VIETNAM | 21 | +| ASIA | CHINA | 18 | +| AMERICA | UNITED STATES | 24 | +| AMERICA | PERU | 17 | +| AMERICA | CANADA | 3 | +| MIDDLE EAST | SAUDI ARABIA | 20 | +| MIDDLE EAST | EGYPT | 4 | +| AFRICA | MOZAMBIQUE | 16 | +| AFRICA | ETHIOPIA | 5 | ++-------------+----------------+------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + These examples demonstrate the <codeph>AND</codeph> operator: + </p> + +<codeblock>[localhost:21000] > select true and true; ++---------------+ +| true and true | ++---------------+ +| true | ++---------------+ +[localhost:21000] > select true and false; ++----------------+ +| true and false | ++----------------+ +| false | ++----------------+ +[localhost:21000] > select false and false; ++-----------------+ +| false and false | ++-----------------+ +| false | ++-----------------+ +[localhost:21000] > select true and null; ++---------------+ +| true and null | ++---------------+ +| NULL | ++---------------+ +[localhost:21000] > select (10 > 2) and (6 != 9); ++-----------------------+ +| (10 > 2) and (6 != 9) | ++-----------------------+ +| true | ++-----------------------+ +</codeblock> + + <p> + These examples demonstrate the <codeph>OR</codeph> operator: + </p> + +<codeblock>[localhost:21000] > select true or true; ++--------------+ +| true or true | ++--------------+ +| true | ++--------------+ +[localhost:21000] > select true or false; ++---------------+ +| true or false | ++---------------+ +| true | ++---------------+ +[localhost:21000] > select false or false; ++----------------+ +| false or false | ++----------------+ +| false | ++----------------+ +[localhost:21000] > select true or null; ++--------------+ +| true or null | ++--------------+ +| true | ++--------------+ +[localhost:21000] > select null or true; ++--------------+ +| null or true | ++--------------+ +| true | ++--------------+ +[localhost:21000] > select false or null; ++---------------+ +| false or null | ++---------------+ +| NULL | ++---------------+ +[localhost:21000] > select (1 = 1) or ('hello' = 'world'); ++--------------------------------+ +| (1 = 1) or ('hello' = 'world') | ++--------------------------------+ +| true | ++--------------------------------+ +[localhost:21000] > select (2 + 2 != 4) or (-1 > 0); ++--------------------------+ +| (2 + 2 != 4) or (-1 > 0) | ++--------------------------+ +| false | ++--------------------------+ +</codeblock> + + <p> + These examples demonstrate the <codeph>NOT</codeph> operator: + </p> + +<codeblock>[localhost:21000] > select not true; ++----------+ +| not true | ++----------+ +| false | ++----------+ +[localhost:21000] > select not false; ++-----------+ +| not false | ++-----------+ +| true | ++-----------+ +[localhost:21000] > select not null; ++----------+ +| not null | ++----------+ +| NULL | ++----------+ +[localhost:21000] > select not (1=1); ++-------------+ +| not (1 = 1) | ++-------------+ +| false | ++-------------+ +</codeblock> + + </conbody> + + </concept> + + <concept id="regexp"> + + <title>REGEXP Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">REGEXP operator</indexterm> + Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where <codeph>^</codeph> and + <codeph>$</codeph> match the beginning and end of the string, <codeph>.</codeph> represents any single character, <codeph>*</codeph> + represents a sequence of zero or more items, <codeph>+</codeph> represents a sequence of one or more items, <codeph>?</codeph> + produces a non-greedy match, and so on. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> REGEXP <varname>regular_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/regular_expression_whole_string"/> + + <p> + The <codeph>RLIKE</codeph> operator is a synonym for <codeph>REGEXP</codeph>. + </p> + + <p> + The <codeph>|</codeph> symbol is the alternation operator, typically used within <codeph>()</codeph> to match different sequences. + The <codeph>()</codeph> groups do not allow backreferences. To retrieve the part of a value matched within a <codeph>()</codeph> + section, use the <codeph><xref href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph> + built-in function. + </p> + + <note rev="1.3.1"> + <p rev="1.3.1" conref="../shared/impala_common.xml#common/regexp_matching"/> + </note> + + <p conref="../shared/impala_common.xml#common/regexp_re2"/> + + <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a REGEXP example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators. + </p> + +<!-- Same examples shown for both REGEXP and RLIKE operators. --> + +<codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p rev="2.5.0"> + For regular expression matching with case-insensitive comparisons, see <xref href="impala_operators.xml#iregexp"/>. + </p> + + </conbody> + + </concept> + + <concept id="rlike"> + + <title>RLIKE Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">RLIKE operator</indexterm> + Synonym for the <codeph>REGEXP</codeph> operator. See <xref href="impala_operators.xml#regexp"/> for details. + </p> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators. + </p> + +<!-- Same examples shown for both REGEXP and RLIKE operators. --> + +<codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/> + + </conbody> + + </concept> + +</concept>
