http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_num_nodes.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_num_nodes.xml b/docs/topics/impala_num_nodes.xml index 75ae8e8..8a8fe88 100644 --- a/docs/topics/impala_num_nodes.xml +++ b/docs/topics/impala_num_nodes.xml @@ -3,10 +3,14 @@ <concept id="num_nodes"> <title>NUM_NODES Query Option</title> + <titlealts audience="PDF"><navtitle>NUM_NODES</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> <data name="Category" value="Impala Query Options"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Troubleshooting"/> </metadata> </prolog> @@ -15,7 +19,6 @@ <p> <indexterm audience="Cloudera">NUM_NODES query option</indexterm> Limit the number of nodes that process a query, typically during debugging. - </p> <p> @@ -41,5 +44,17 @@ <p conref="../shared/impala_common.xml#common/num_nodes_tip"/> + <note type="warning" rev="DOCS-1161"> + <p> + Because this option results in increased resource utilization on a single host, + it could cause problems due to contention with other Impala statements or + high resource usage. Symptoms could include queries running slowly, exceeding the memory limit, + or appearing to hang. Use it only in a single-user development/test environment; + <b>do not</b> use it in a production environment or in a cluster with a high-concurrency + or high-volume or performance-critical workload. + </p> + </note> + + </conbody> </concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_num_scanner_threads.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_num_scanner_threads.xml b/docs/topics/impala_num_scanner_threads.xml index 27cf883..ec609d2 100644 --- a/docs/topics/impala_num_scanner_threads.xml +++ b/docs/topics/impala_num_scanner_threads.xml @@ -3,10 +3,14 @@ <concept id="num_scanner_threads"> <title>NUM_SCANNER_THREADS Query Option</title> + <titlealts audience="PDF"><navtitle>NUM_SCANNER_THREADS</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> <data name="Category" value="Impala Query Options"/> + <data name="Category" value="Performance"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> @@ -27,6 +31,5 @@ <b>Default:</b> 0 </p> - <note conref="../shared/impala_common.xml#common/compute_stats_parquet"/> </conbody> </concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_odbc.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_odbc.xml b/docs/topics/impala_odbc.xml index f899f8f..c73915e 100644 --- a/docs/topics/impala_odbc.xml +++ b/docs/topics/impala_odbc.xml @@ -3,7 +3,16 @@ <concept id="impala_odbc"> <title id="odbc">Configuring Impala to Work with ODBC</title> - + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="ODBC"/> + <data name="Category" value="Querying"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> <conbody> @@ -17,8 +26,178 @@ ODBC connectors. </p> - + <p outputclass="toc inpage"/> + + </conbody> + + <concept id="odbc_driver"> + <title>Downloading the ODBC Driver</title> + <conbody> + + <note type="important"> + As of late 2015, most business intelligence applications are certified with the 2.x ODBC drivers. + Although the instructions on this page cover both the 2.x and 1.x drivers, expect to use the 2.x drivers + exclusively for most ODBC applications connecting to Impala. + </note> + + <p> + See the + <xref href="http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html?productID=3mDesWddUS" scope="external" format="html">downloads + page</xref> for a matrix of the certified driver version for different products. See the + <xref href="http://www.cloudera.com/content/support/en/documentation/cloudera-connectors-documentation/connector-documentation-latest.html" scope="external" format="html">documentation + page</xref> for installation instructions. + </p> + + </conbody> + </concept> + + <concept id="odbc_port"> + <title>Configuring the ODBC Port</title> + <conbody> + <p> + Versions 2.5 and 2.0 of the Cloudera ODBC Connector, currently certified for some but not all BI applications, + use the HiveServer2 protocol, corresponding to Impala port 21050. Impala supports Kerberos authentication with + all the supported versions of the driver, and requires ODBC 2.05.13 for Impala or higher for LDAP username/password + authentication. + </p> + + <p> + Version 1.x of the Cloudera ODBC Connector uses the original HiveServer1 protocol, corresponding to Impala + port 21000. + </p> + + </conbody> + </concept> + + <concept id="odbc_demo"> + <title>Example of Setting Up an ODBC Application for Impala</title> + <conbody> + + <p> + To illustrate the outline of the setup process, here is a transcript of a session to set up + all required drivers and a business intelligence application + that uses the ODBC driver, under Mac OS X. Each <codeph>.dmg</codeph> file runs a GUI-based + installer, first for the <xref href="http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/Downloads" scope="external" format="html">underlying IODBC driver</xref> needed for non-Windows systems, then for + the Cloudera ODBC Connector, and finally for the BI tool itself. + </p> + +<codeblock>$ ls -1 +Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf +BI_Tool_Installer.dmg +iodbc-sdk-3.52.7-macosx-10.5.dmg +ClouderaImpalaODBC.dmg +$ open iodbc-sdk-3.52.7-macosx-10.dmg +<varname>Install the IODBC driver using its installer</varname> +$ open ClouderaImpalaODBC.dmg +<varname>Install the Cloudera ODBC Connector using its installer</varname> +$ installer_dir=$(pwd) +$ cd /opt/cloudera/impalaodbc +$ ls -1 +Cloudera ODBC Driver for Impala Install Guide.pdf +Readme.txt +Setup +lib +ErrorMessages +Release Notes.txt +Tools +$ cd Setup +$ ls +odbc.ini odbcinst.ini +$ cp odbc.ini ~/.odbc.ini +$ vi ~/.odbc.ini +$ cat ~/.odbc.ini +[ODBC] +# Specify any global ODBC configuration here such as ODBC tracing. + +[ODBC Data Sources] +Sample Cloudera Impala DSN=Cloudera ODBC Driver for Impala + +[Sample Cloudera Impala DSN] + +# Description: DSN Description. +# This key is not necessary and is only to give a description of the data source. +Description=Cloudera ODBC Driver for Impala DSN + +# Driver: The location where the ODBC driver is installed to. +Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib + +# The DriverUnicodeEncoding setting is only used for SimbaDM +# When set to 1, SimbaDM runs in UTF-16 mode. +# When set to 2, SimbaDM runs in UTF-8 mode. +#DriverUnicodeEncoding=2 + +# Values for HOST, PORT, KrbFQDN, and KrbServiceName should be set here. +# They can also be specified on the connection string. +HOST=hostname.sample.example.com +PORT=21050 +Schema=default + +# The authentication mechanism. +# 0 - No authentication (NOSASL) +# 1 - Kerberos authentication (SASL) +# 2 - Username authentication (SASL) +# 3 - Username/password authentication (SASL) +# 4 - Username/password authentication with SSL (SASL) +# 5 - No authentication with SSL (NOSASL) +# 6 - Username/password authentication (NOSASL) +AuthMech=0 + +# Kerberos related settings. +KrbFQDN= +KrbRealm= +KrbServiceName= + +# Username/password authentication with SSL settings. +UID= +PWD +CAIssuedCertNamesMismatch=1 +TrustedCerts=/opt/cloudera/impalaodbc/lib/universal/cacerts.pem + +# Specify the proxy user ID to use. +#DelegationUID= + +# General settings +TSaslTransportBufSize=1000 +RowsFetchedPerBlock=10000 +SocketTimeout=0 +StringColumnLength=32767 +UseNativeQuery=0 +$ pwd +/opt/cloudera/impalaodbc/Setup +$ cd $installer_dir +$ open BI_Tool_Installer.dmg +<varname>Install the BI tool using its installer</varname> +$ ls /Applications | grep BI_Tool +BI_Tool.app +$ open -a BI_Tool.app +<varname>In the BI tool, connect to a data source using port 21050</varname> +</codeblock> + </conbody> </concept> + <concept rev="2.3.0" id="jdbc_odbc_notes"> + <title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title> + <conbody> + + <p> + Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname> interpreter + of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between + the interactive shell and applications using the APIs: + </p> + + <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/> + + <ul> + <li> + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/> + </li> + <li> + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/> + </li> + </ul> + + </conbody> + </concept> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_offset.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_offset.xml b/docs/topics/impala_offset.xml index c9c073d..534f6b6 100644 --- a/docs/topics/impala_offset.xml +++ b/docs/topics/impala_offset.xml @@ -9,6 +9,8 @@ <data name="Category" value="SQL"/> <data name="Category" value="Querying"/> <data name="Category" value="Reports"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_operators.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_operators.xml b/docs/topics/impala_operators.xml index da3dab3..9923426 100644 --- a/docs/topics/impala_operators.xml +++ b/docs/topics/impala_operators.xml @@ -1,7 +1,9 @@ -<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="operators"> <title>SQL Operators</title> + <prolog> <metadata> <data name="Category" value="Impala"/> @@ -15,11 +17,12 @@ <p> <indexterm audience="Cloudera">operators</indexterm> - SQL operators are a class of comparison functions that are widely used within the <codeph>WHERE</codeph> - clauses of <codeph>SELECT</codeph> statements. + SQL operators are a class of comparison functions that are widely used within the <codeph>WHERE</codeph> clauses of + <codeph>SELECT</codeph> statements. </p> <p outputclass="toc inpage"/> + </conbody> <concept rev="1.4.0" id="arithmetic_operators"> @@ -30,8 +33,7 @@ <p> <indexterm audience="Cloudera">arithmetic operators</indexterm> - The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) - a right-hand argument. + The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -46,24 +48,21 @@ <ul> <li> <p> - With unary notation, such as <codeph>+5</codeph>, <codeph>-2.5</codeph>, or - <codeph>-<varname>col_name</varname></codeph>, they multiply their single numeric argument by - <codeph>+1</codeph> or <codeph>-1</codeph>. Therefore, unary <codeph>+</codeph> returns its - argument unchanged, while unary <codeph>-</codeph> flips the sign of its argument. Although you can - double up these operators in expressions such as <codeph>++5</codeph> (always positive) or - <codeph>-+2</codeph> or <codeph>+-2</codeph> (both always negative), you cannot double the unary - minus operator because <codeph>--</codeph> is interpreted as the start of a comment. (You can use a - double unary minus operator if you separate the <codeph>-</codeph> characters, for example with a - space or parentheses.) + With unary notation, such as <codeph>+5</codeph>, <codeph>-2.5</codeph>, or <codeph>-<varname>col_name</varname></codeph>, + they multiply their single numeric argument by <codeph>+1</codeph> or <codeph>-1</codeph>. Therefore, unary + <codeph>+</codeph> returns its argument unchanged, while unary <codeph>-</codeph> flips the sign of its argument. Although + you can double up these operators in expressions such as <codeph>++5</codeph> (always positive) or <codeph>-+2</codeph> or + <codeph>+-2</codeph> (both always negative), you cannot double the unary minus operator because <codeph>--</codeph> is + interpreted as the start of a comment. (You can use a double unary minus operator if you separate the <codeph>-</codeph> + characters, for example with a space or parentheses.) </p> </li> <li> <p> - With binary notation, such as <codeph>2+2</codeph>, <codeph>5-2.5</codeph>, or - <codeph><varname>col1</varname> + <varname>col2</varname></codeph>, they add or subtract - respectively the right-hand argument to (or from) the left-hand argument. Both arguments must be of - numeric types. + With binary notation, such as <codeph>2+2</codeph>, <codeph>5-2.5</codeph>, or <codeph><varname>col1</varname> + + <varname>col2</varname></codeph>, they add or subtract respectively the right-hand argument to (or from) the left-hand + argument. Both arguments must be of numeric types. </p> </li> </ul> @@ -71,41 +70,39 @@ <li> <p> - <codeph>*</codeph> and <codeph>/</codeph>: Multiplication and division respectively. Both arguments - must be of numeric types. + <codeph>*</codeph> and <codeph>/</codeph>: Multiplication and division respectively. Both arguments must be of numeric types. </p> + <p> - When multiplying, the shorter argument is promoted if necessary (such as <codeph>SMALLINT</codeph> to - <codeph>INT</codeph> or <codeph>BIGINT</codeph>, or <codeph>FLOAT</codeph> to <codeph>DOUBLE</codeph>), - and then the result is promoted again to the next larger type. Thus, multiplying a - <codeph>TINYINT</codeph> and an <codeph>INT</codeph> produces a <codeph>BIGINT</codeph> result. - Multiplying a <codeph>FLOAT</codeph> and a <codeph>FLOAT</codeph> produces a <codeph>DOUBLE</codeph> - result. Multiplying a <codeph>FLOAT</codeph> and a <codeph>DOUBLE</codeph> or a <codeph>DOUBLE</codeph> - and a <codeph>DOUBLE</codeph> produces a <codeph>DECIMAL(38,17)</codeph>, because - <codeph>DECIMAL</codeph> values can represent much larger and more precise values than + When multiplying, the shorter argument is promoted if necessary (such as <codeph>SMALLINT</codeph> to <codeph>INT</codeph> or + <codeph>BIGINT</codeph>, or <codeph>FLOAT</codeph> to <codeph>DOUBLE</codeph>), and then the result is promoted again to the + next larger type. Thus, multiplying a <codeph>TINYINT</codeph> and an <codeph>INT</codeph> produces a <codeph>BIGINT</codeph> + result. Multiplying a <codeph>FLOAT</codeph> and a <codeph>FLOAT</codeph> produces a <codeph>DOUBLE</codeph> result. Multiplying + a <codeph>FLOAT</codeph> and a <codeph>DOUBLE</codeph> or a <codeph>DOUBLE</codeph> and a <codeph>DOUBLE</codeph> produces a + <codeph>DECIMAL(38,17)</codeph>, because <codeph>DECIMAL</codeph> values can represent much larger and more precise values than <codeph>DOUBLE</codeph>. </p> + <p> - When dividing, Impala always treats the arguments and result as <codeph>DOUBLE</codeph> values to avoid - losing precision. If you need to insert the results of a division operation into a - <codeph>FLOAT</codeph> column, use the <codeph>CAST()</codeph> function to convert the result to the - correct type. + When dividing, Impala always treats the arguments and result as <codeph>DOUBLE</codeph> values to avoid losing precision. If you + need to insert the results of a division operation into a <codeph>FLOAT</codeph> column, use the <codeph>CAST()</codeph> + function to convert the result to the correct type. </p> </li> <li> <p> - <codeph>%</codeph>: Modulo operator. Returns the remainder of the left-hand argument divided by the - right-hand argument. Both arguments must be of one of the integer types. + <codeph>%</codeph>: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both + arguments must be of one of the integer types. </p> </li> <li> <p> <codeph>&</codeph>, <codeph>|</codeph>, <codeph>~</codeph>, and <codeph>^</codeph>: Bitwise operators that return the - logical AND, logical OR, <codeph>NOT</codeph>, or logical XOR (exclusive OR) of their argument values. Both arguments must be - of one of the integer types. If the arguments are of different type, the argument with the smaller type - is implicitly extended to match the argument with the longer type. + logical AND, logical OR, <codeph>NOT</codeph>, or logical XOR (exclusive OR) of their argument values. Both arguments must be of + one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to + match the argument with the longer type. </p> </li> </ul> @@ -115,26 +112,25 @@ </p> <p> - The arithmetic operators generally do not have equivalent calling conventions using functional notation. - For example, prior to Impala 2.2.0 / CDH 5.4.0, there is no <codeph>MOD()</codeph> function equivalent to the <codeph>%</codeph> modulo - operator. Conversely, there are some arithmetic functions that do not have a corresponding operator. For - example, for exponentiation you use the <codeph>POW()</codeph> function, but there is no - <codeph>**</codeph> exponentiation operator. See <xref href="impala_math_functions.xml#math_functions"/> - for the arithmetic functions you can use. + The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to + Impala 2.2.0 / CDH 5.4.0, there is no <codeph>MOD()</codeph> function equivalent to the <codeph>%</codeph> modulo operator. + Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use + the <codeph>POW()</codeph> function, but there is no <codeph>**</codeph> exponentiation operator. See + <xref href="impala_math_functions.xml#math_functions"/> for the arithmetic functions you can use. </p> <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> - + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> <p rev="2.3.0"> - The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type - that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> - is extracted, it can be used in an arithmetic expression, such as multiplying by 10: + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: </p> <codeblock rev="2.3.0"> @@ -171,6 +167,7 @@ where nation.item.n_nationkey < 5; </codeblock> </conbody> + </concept> <concept id="between"> @@ -181,10 +178,9 @@ where nation.item.n_nationkey < 5; <p> <indexterm audience="Cloudera">BETWEEN operator</indexterm> - In a <codeph>WHERE</codeph> clause, compares an expression to both a lower and upper bound. The comparison - is successful is the expression is greater than or equal to the lower bound, and less than or equal to the - upper bound. If the bound values are switched, so the lower bound is greater than the upper bound, does not - match any values. + In a <codeph>WHERE</codeph> clause, compares an expression to both a lower and upper bound. The comparison is successful is the + expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched, + so the lower bound is greater than the upper bound, does not match any values. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -192,21 +188,20 @@ where nation.item.n_nationkey < 5; <codeblock><varname>expression</varname> BETWEEN <varname>lower_bound</varname> AND <varname>upper_bound</varname></codeblock> <p> - <b>Data types:</b> Typically used with numeric data types. Works with any data type, although not very - practical for <codeph>BOOLEAN</codeph> values. (<codeph>BETWEEN false AND true</codeph> will match all - <codeph>BOOLEAN</codeph> values.) Use <codeph>CAST()</codeph> if necessary to ensure the lower and upper - bound values are compatible types. Call string or date/time functions if necessary to extract or transform - the relevant portion to compare, especially if the value can be transformed into a number. + <b>Data types:</b> Typically used with numeric data types. Works with any data type, although not very practical for + <codeph>BOOLEAN</codeph> values. (<codeph>BETWEEN false AND true</codeph> will match all <codeph>BOOLEAN</codeph> values.) Use + <codeph>CAST()</codeph> if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time + functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a + number. </p> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> <p> - Be careful when using short string operands. A longer string that starts with the upper bound value will - not be included, because it is considered greater than the upper bound. For example, <codeph>BETWEEN 'A' - and 'M'</codeph> would not match the string value <codeph>'Midway'</codeph>. Use functions such as - <codeph>upper()</codeph>, <codeph>lower()</codeph>, <codeph>substr()</codeph>, <codeph>trim()</codeph>, and - so on if necessary to ensure the comparison works as expected. + Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it + is considered greater than the upper bound. For example, <codeph>BETWEEN 'A' and 'M'</codeph> would not match the string value + <codeph>'Midway'</codeph>. Use functions such as <codeph>upper()</codeph>, <codeph>lower()</codeph>, <codeph>substr()</codeph>, + <codeph>trim()</codeph>, and so on if necessary to ensure the comparison works as expected. </p> <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> @@ -228,8 +223,8 @@ select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed <p rev="2.3.0"> The following example shows how to do a <codeph>BETWEEN</codeph> comparison using a numeric field of a <codeph>STRUCT</codeph> type - that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> - is extracted, it can be used in a comparison operator: + that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it + can be used in a comparison operator: </p> <codeblock rev="2.3.0"> @@ -264,6 +259,7 @@ where nation.item.n_nationkey between 3 and 5 </codeblock> </conbody> + </concept> <concept id="comparison_operators"> @@ -274,8 +270,7 @@ where nation.item.n_nationkey between 3 and 5 <p> <indexterm audience="Cloudera">comparison operators</indexterm> - Impala supports the familiar comparison operators for checking equality and sort order for the column data - types: + Impala supports the familiar comparison operators for checking equality and sort order for the column data types: </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -288,9 +283,8 @@ where nation.item.n_nationkey between 3 and 5 </li> <li> - <codeph><</codeph>, <codeph><=</codeph>, <codeph>></codeph>, <codeph>>=</codeph>: apply to - all types; for <codeph>BOOLEAN</codeph>, <codeph>TRUE</codeph> is considered greater than - <codeph>FALSE</codeph>. + <codeph><</codeph>, <codeph><=</codeph>, <codeph>></codeph>, <codeph>>=</codeph>: apply to all types; for + <codeph>BOOLEAN</codeph>, <codeph>TRUE</codeph> is considered greater than <codeph>FALSE</codeph>. </li> </ul> @@ -299,14 +293,14 @@ where nation.item.n_nationkey between 3 and 5 </p> <p> - The <codeph>IN</codeph> and <codeph>BETWEEN</codeph> operators provide shorthand notation for expressing - combinations of equality, less than, and greater than comparisons with a single operator. + The <codeph>IN</codeph> and <codeph>BETWEEN</codeph> operators provide shorthand notation for expressing combinations of equality, + less than, and greater than comparisons with a single operator. </p> <p> - Because comparing any value to <codeph>NULL</codeph> produces <codeph>NULL</codeph> rather than - <codeph>TRUE</codeph> or <codeph>FALSE</codeph>, use the <codeph>IS NULL</codeph> and <codeph>IS NOT - NULL</codeph> operators to check if a value is <codeph>NULL</codeph> or not. + Because comparing any value to <codeph>NULL</codeph> produces <codeph>NULL</codeph> rather than <codeph>TRUE</codeph> or + <codeph>FALSE</codeph>, use the <codeph>IS NULL</codeph> and <codeph>IS NOT NULL</codeph> operators to check if a value is + <codeph>NULL</codeph> or not. </p> <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> @@ -314,9 +308,9 @@ where nation.item.n_nationkey between 3 and 5 <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> <p rev="2.3.0"> - The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type - that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> - is extracted, it can be used with a comparison operator such as <codeph><</codeph>: + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used with a comparison operator such as <codeph><</codeph>: </p> <codeblock rev="2.3.0"> @@ -353,6 +347,7 @@ where nation.item.n_nationkey < 5 </codeblock> </conbody> + </concept> <concept audience="Cloudera" rev="2.1.0" id="except"> @@ -364,7 +359,9 @@ where nation.item.n_nationkey < 5 <p> <indexterm audience="Cloudera">EXCEPT operator</indexterm> </p> + </conbody> + </concept> <concept rev="2.0.0" id="exists"> @@ -375,14 +372,15 @@ where nation.item.n_nationkey < 5 <p> <indexterm audience="Cloudera">EXISTS operator</indexterm> + <indexterm audience="Cloudera">NOT EXISTS operator</indexterm> - The <codeph>EXISTS</codeph> operator tests whether a subquery returns any results. - You typically use it to find values from one table that have corresponding values in another table. + The <codeph>EXISTS</codeph> operator tests whether a subquery returns any results. You typically use it to find values from one + table that have corresponding values in another table. </p> <p> - The converse, <codeph>NOT EXISTS</codeph>, helps to find all the values from one table that do not have any - corresponding values in another table. + The converse, <codeph>NOT EXISTS</codeph>, helps to find all the values from one table that do not have any corresponding values in + another table. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -394,30 +392,28 @@ NOT EXISTS (<varname>subquery</varname>) <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> <p> - The subquery can refer to a different table than the outer query block, or the same table. For example, you - might use <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> to check the existence of parent/child - relationships between two columns of the same table. + The subquery can refer to a different table than the outer query block, or the same table. For example, you might use + <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> to check the existence of parent/child relationships between two columns of + the same table. </p> <p> - You can also use operators and function calls within the subquery to test for other kinds of relationships - other than strict equality. For example, you might use a call to <codeph>COUNT()</codeph> in the subquery - to check whether the number of matching values is higher or lower than some limit. You might call a UDF in - the subquery to check whether values in one table matches a hashed representation of those same values in a - different table. + You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict + equality. For example, you might use a call to <codeph>COUNT()</codeph> in the subquery to check whether the number of matching + values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a + hashed representation of those same values in a different table. </p> <p conref="../shared/impala_common.xml#common/null_blurb"/> <p> - If the subquery returns any value at all (even <codeph>NULL</codeph>), <codeph>EXISTS</codeph> returns - <codeph>TRUE</codeph> and <codeph>NOT EXISTS</codeph> returns false. + If the subquery returns any value at all (even <codeph>NULL</codeph>), <codeph>EXISTS</codeph> returns <codeph>TRUE</codeph> and + <codeph>NOT EXISTS</codeph> returns false. </p> <p> - The following example shows how even when the subquery returns only <codeph>NULL</codeph> values, - <codeph>EXISTS</codeph> still returns <codeph>TRUE</codeph> and thus matches all the rows from the table in - the outer query block. + The following example shows how even when the subquery returns only <codeph>NULL</codeph> values, <codeph>EXISTS</codeph> still + returns <codeph>TRUE</codeph> and thus matches all the rows from the table in the outer query block. </p> <codeblock>[localhost:21000] > create table all_nulls (x int); @@ -433,8 +429,8 @@ NOT EXISTS (<varname>subquery</varname>) </codeblock> <p> - However, if the table in the subquery is empty and so the subquery returns an empty result set, - <codeph>EXISTS</codeph> returns <codeph>FALSE</codeph>: + However, if the table in the subquery is empty and so the subquery returns an empty result set, <codeph>EXISTS</codeph> returns + <codeph>FALSE</codeph>: </p> <codeblock>[localhost:21000] > create table empty (x int); @@ -448,8 +444,11 @@ NOT EXISTS (<varname>subquery</varname>) <p conref="../shared/impala_common.xml#common/subquery_no_limit"/> - <p> - The <codeph>NOT EXISTS</codeph> operator requires a correlated subquery. + <p rev="IMPALA-3232"> + Prior to CDH 5.8 / Impala 2.6, + the <codeph>NOT EXISTS</codeph> operator required a correlated subquery. + In CDH 5.8 / Impala 2.6 and higher, <codeph>NOT EXISTS</codeph> works with + uncorrelated queries also. </p> <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> @@ -483,10 +482,9 @@ NOT EXISTS (<varname>subquery</varname>) </p> <p> - The following example shows a correlated subquery that finds all the values in one table that exist in - another table. For each value <codeph>X</codeph> from <codeph>T1</codeph>, the query checks if the - <codeph>Y</codeph> column of <codeph>T2</codeph> contains an identical value, and the - <codeph>EXISTS</codeph> operator returns <codeph>TRUE</codeph> or <codeph>FALSE</codeph> as appropriate in + The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value + <codeph>X</codeph> from <codeph>T1</codeph>, the query checks if the <codeph>Y</codeph> column of <codeph>T2</codeph> contains an + identical value, and the <codeph>EXISTS</codeph> operator returns <codeph>TRUE</codeph> or <codeph>FALSE</codeph> as appropriate in each case. </p> @@ -501,9 +499,9 @@ NOT EXISTS (<varname>subquery</varname>) </codeblock> <p> - An uncorrelated query is less interesting in this case. Because the subquery always returns - <codeph>TRUE</codeph>, all rows from <codeph>T1</codeph> are returned. If the table contents where changed - so that the subquery did not match any rows, none of the rows from <codeph>T1</codeph> would be returned. + An uncorrelated query is less interesting in this case. Because the subquery always returns <codeph>TRUE</codeph>, all rows from + <codeph>T1</codeph> are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows + from <codeph>T1</codeph> would be returned. </p> <codeblock>[localhost:21000] > select x from t1 where exists (select y from t2 where y > 5); @@ -520,10 +518,10 @@ NOT EXISTS (<varname>subquery</varname>) </codeblock> <p> - The following example shows how an uncorrelated subquery can test for the existence of some condition - within a table. By using <codeph>LIMIT 1</codeph> or an aggregate function, the query returns a single - result or no result based on whether the subquery matches any rows. Here, we know that <codeph>T1</codeph> - and <codeph>T2</codeph> contain some even numbers, but <codeph>T3</codeph> does not. + The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using + <codeph>LIMIT 1</codeph> or an aggregate function, the query returns a single result or no result based on whether the subquery + matches any rows. Here, we know that <codeph>T1</codeph> and <codeph>T2</codeph> contain some even numbers, but <codeph>T3</codeph> + does not. </p> <codeblock>[localhost:21000] > select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1; @@ -551,10 +549,9 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' </codeblock> <p> - The following example finds numbers in one table that are 1 greater than numbers from another table. The - <codeph>EXISTS</codeph> notation is simpler than an equivalent <codeph>CROSS JOIN</codeph> between the - tables. (The example then also illustrates how the same test could be performed using an - <codeph>IN</codeph> operator.) + The following example finds numbers in one table that are 1 greater than numbers from another table. The <codeph>EXISTS</codeph> + notation is simpler than an equivalent <codeph>CROSS JOIN</codeph> between the tables. (The example then also illustrates how the + same test could be performed using an <codeph>IN</codeph> operator.) </p> <codeblock>[localhost:21000] > select x from t1 where exists (select y from t2 where x = y + 1); @@ -588,11 +585,10 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' </codeblock> <p> - The following example uses the <codeph>NOT EXISTS</codeph> operator to find all the leaf nodes in - tree-structured data. This simplified <q>tree of life</q> has multiple levels (class, order, family, and so - on), with each item pointing upward through a <codeph>PARENT</codeph> pointer. The example runs an outer - query and a subquery on the same table, returning only those items whose <codeph>ID</codeph> value is - <i>not</i> referenced by the <codeph>PARENT</codeph> of any other item. + The following example uses the <codeph>NOT EXISTS</codeph> operator to find all the leaf nodes in tree-structured data. This + simplified <q>tree of life</q> has multiple levels (class, order, family, and so on), with each item pointing upward through a + <codeph>PARENT</codeph> pointer. The example runs an outer query and a subquery on the same table, returning only those items whose + <codeph>ID</codeph> value is <i>not</i> referenced by the <codeph>PARENT</codeph> of any other item. </p> <codeblock>[localhost:21000] > create table tree (id int, parent int, name string); @@ -625,7 +621,101 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' <p> <xref href="impala_subqueries.xml#subqueries"/> </p> + + </conbody> + + </concept> + + <concept rev="2.4.0" id="ilike"> + + <title>ILIKE Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">ILIKE operator</indexterm> + A case-insensitive comparison operator for <codeph>STRING</codeph> data, with basic wildcard capability using <codeph>_</codeph> to match a single + character and <codeph>%</codeph> to match multiple characters. The argument expression must match the entire string value. + Typically, it is more efficient to put any <codeph>%</codeph> wildcard match at the end of the string. + </p> + + <p> + This operator, available in CDH 5.7 / Impala 2.5 and higher, is the equivalent of the <codeph>LIKE</codeph> operator, + but with case-insensitive comparisons. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> ILIKE <varname>wildcard_expression</varname> +<varname>string_expression</varname> NOT ILIKE <varname>wildcard_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a LIKE example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + In the following examples, strings that are the same except for differences in uppercase + and lowercase match successfully with <codeph>ILIKE</codeph>, but do not match + with <codeph>LIKE</codeph>: + </p> + +<codeblock>select 'fooBar' ilike 'FOOBAR'; ++-------------------------+ +| 'foobar' ilike 'foobar' | ++-------------------------+ +| true | ++-------------------------+ + +select 'fooBar' like 'FOOBAR'; ++------------------------+ +| 'foobar' like 'foobar' | ++------------------------+ +| false | ++------------------------+ + +select 'FOOBAR' ilike 'f%'; ++---------------------+ +| 'foobar' ilike 'f%' | ++---------------------+ +| true | ++---------------------+ + +select 'FOOBAR' like 'f%'; ++--------------------+ +| 'foobar' like 'f%' | ++--------------------+ +| false | ++--------------------+ + +select 'ABCXYZ' not ilike 'ab_xyz'; ++-----------------------------+ +| not 'abcxyz' ilike 'ab_xyz' | ++-----------------------------+ +| false | ++-----------------------------+ + +select 'ABCXYZ' not like 'ab_xyz'; ++----------------------------+ +| not 'abcxyz' like 'ab_xyz' | ++----------------------------+ +| true | ++----------------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p rev="2.5.0"> + For case-sensitive comparisons, see <xref href="impala_operators.xml#like"/>. + For a more general kind of search operator using regular expressions, see <xref href="impala_operators.xml#regexp"/> + or its case-insensitive counterpart <xref href="impala_operators.xml#iregexp"/>. + </p> + </conbody> + </concept> <concept id="in"> @@ -636,10 +726,11 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' <p> <indexterm audience="Cloudera">IN operator</indexterm> + <indexterm audience="Cloudera">NOT IN operator</indexterm> - The <codeph>IN</codeph> operator compares an argument value to a set of values, and returns - <codeph>TRUE</codeph> if the argument matches any value in the set. The <codeph>NOT IN</codeph> operator - reverses the comparison, and checks if the argument value is not part of a set of values. + The <codeph>IN</codeph> operator compares an argument value to a set of values, and returns <codeph>TRUE</codeph> if the argument + matches any value in the set. The <codeph>NOT IN</codeph> operator reverses the comparison, and checks if the argument value is not + part of a set of values. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -656,30 +747,29 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' </p> <p> - The left-hand expression must consist only of a single value, not a tuple. Although the left-hand - expression is typically a column name, it could also be some other value. For example, the - <codeph>WHERE</codeph> clauses <codeph>WHERE id IN (5)</codeph> and <codeph>WHERE 5 IN (id)</codeph> - produce the same results. + The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column + name, it could also be some other value. For example, the <codeph>WHERE</codeph> clauses <codeph>WHERE id IN (5)</codeph> and + <codeph>WHERE 5 IN (id)</codeph> produce the same results. </p> - <p> - The set of values to check against can be specified as constants, function calls, column names, or other - expressions in the query text. When the values are listed explicitly, the maximum number of expressions is - 10,000. + <p rev="CDH-40407"> + The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query + text. The maximum number of expressions in the <codeph>IN</codeph> list is 9999. (The maximum number of elements of + a single expression is 10,000 items, and the <codeph>IN</codeph> operator itself counts as one.) </p> <p rev="2.0.0"> - In Impala 2.0 and higher, the set of values can also be generated by a subquery. <codeph>IN</codeph> can - evaluate an unlimited number of results using a subquery. + In Impala 2.0 and higher, the set of values can also be generated by a subquery. <codeph>IN</codeph> can evaluate an unlimited + number of results using a subquery. </p> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> <p> - Any expression using the <codeph>IN</codeph> operator could be rewritten as a series of equality tests - connected with <codeph>OR</codeph>, but the <codeph>IN</codeph> syntax is often clearer, more concise, and - easier for Impala to optimize. For example, with partitioned tables, queries frequently use - <codeph>IN</codeph> clauses to filter data by comparing the partition key columns to specific values. + Any expression using the <codeph>IN</codeph> operator could be rewritten as a series of equality tests connected with + <codeph>OR</codeph>, but the <codeph>IN</codeph> syntax is often clearer, more concise, and easier for Impala to optimize. For + example, with partitioned tables, queries frequently use <codeph>IN</codeph> clauses to filter data by comparing the partition key + columns to specific values. </p> <p conref="../shared/impala_common.xml#common/null_blurb"/> @@ -703,8 +793,8 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' </codeblock> <p> - If the searched value is not found in the comparison values, and the comparison values include - <codeph>NULL</codeph>, the result is <codeph>NULL</codeph>: + If the searched value is not found in the comparison values, and the comparison values include <codeph>NULL</codeph>, the result is + <codeph>NULL</codeph>: </p> <codeblock>[localhost:21000] > select 5 in (1,null,2,3); @@ -734,8 +824,8 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' </codeblock> <p> - If the left-hand argument is <codeph>NULL</codeph>, <codeph>IN</codeph> always returns - <codeph>NULL</codeph>. This rule applies even if the comparison values include <codeph>NULL</codeph>. + If the left-hand argument is <codeph>NULL</codeph>, <codeph>IN</codeph> always returns <codeph>NULL</codeph>. This rule applies even + if the comparison values include <codeph>NULL</codeph>. </p> <codeblock>[localhost:21000] > select null in (1,2,3); @@ -771,9 +861,9 @@ ERROR: AnalysisException: couldn't resolve column reference: 'x' <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> <p rev="2.3.0"> - The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type - that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> - is extracted, it can be used in an arithmetic expression, such as multiplying by 10: + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: </p> <codeblock rev="2.3.0"> @@ -827,7 +917,9 @@ SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','J <p> <xref href="impala_subqueries.xml#subqueries"/> </p> + </conbody> + </concept> <concept audience="Cloudera" rev="2.1.0" id="intersect"> @@ -839,7 +931,233 @@ SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','J <p> <indexterm audience="Cloudera">INTERSECT operator</indexterm> </p> + + </conbody> + + </concept> + + <concept rev="2.5.0" id="iregexp"> + + <title>IREGEXP Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IREGEXP operator</indexterm> + Tests whether a value matches a regular expression, using case-insensitive string comparisons. + Uses the POSIX regular expression syntax where <codeph>^</codeph> and + <codeph>$</codeph> match the beginning and end of the string, <codeph>.</codeph> represents any single character, <codeph>*</codeph> + represents a sequence of zero or more items, <codeph>+</codeph> represents a sequence of one or more items, <codeph>?</codeph> + produces a non-greedy match, and so on. + </p> + + <p> + This operator, available in CDH 5.7 / Impala 2.5 and higher, is the equivalent of the <codeph>REGEXP</codeph> operator, + but with case-insensitive comparisons. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>string_expression</varname> IREGEXP <varname>regular_expression</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/regular_expression_whole_string"/> + +<!-- Currently, there isn't any IRLIKE synonym, so REGEXP and IREGEXP are different in that respect. + I pinged IMPALA-1787 to check if that's intentional. + <p> + The <codeph>IRLIKE</codeph> operator is a synonym for <codeph>IREGEXP</codeph>. + </p> +--> + + <p rev="2.5.0"> + The <codeph>|</codeph> symbol is the alternation operator, typically used within <codeph>()</codeph> to match different sequences. + The <codeph>()</codeph> groups do not allow backreferences. To retrieve the part of a value matched within a <codeph>()</codeph> + section, use the <codeph><xref href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph> + built-in function. (Currently, there is not any case-insensitive equivalent for the <codeph>regexp_extract()</codeph> function.) + </p> + + <note rev="1.3.1"> + <p rev="1.3.1" conref="../shared/impala_common.xml#common/regexp_matching"/> + </note> + + <p conref="../shared/impala_common.xml#common/regexp_re2"/> + + <p conref="../shared/impala_common.xml#common/regexp_re2_warning"/> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> + +<!-- To do: construct a REGEXP example for complex types. --> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples demonstrate the syntax for the <codeph>IREGEXP</codeph> operator. + </p> + +<codeblock>select 'abcABCaabbcc' iregexp '^[a-c]+$'; ++---------------------------------+ +| 'abcabcaabbcc' iregexp '[a-c]+' | ++---------------------------------+ +| true | ++---------------------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_operators.xml#regexp"/> + </p> + + </conbody> + + </concept> + + <concept rev="2.5.0 IMPALA-2147" id="is_distinct_from"> + + <title id="is_distinct">IS DISTINCT FROM Operator</title> + + <conbody> + + <p> + <indexterm audience="Cloudera">IS DISTINCT FROM operator</indexterm> + + <indexterm audience="Cloudera">IS NOT DISTINCT FROM operator</indexterm> + The <codeph>IS DISTINCT FROM</codeph> operator, and its converse the <codeph>IS NOT DISTINCT FROM</codeph> operator, test whether or + not values are identical. <codeph>IS NOT DISTINCT FROM</codeph> is similar to the <codeph>=</codeph> operator, and <codeph>IS + DISTINCT FROM</codeph> is similar to the <codeph>!=</codeph> operator, except that <codeph>NULL</codeph> values are treated as + identical. Therefore, <codeph>IS NOT DISTINCT FROM</codeph> returns <codeph>true</codeph> rather than <codeph>NULL</codeph>, and + <codeph>IS DISTINCT FROM</codeph> returns <codeph>false</codeph> rather than <codeph>NULL</codeph>, when comparing two + <codeph>NULL</codeph> values. If one of the values being compared is <codeph>NULL</codeph> and the other is not, <codeph>IS DISTINCT + FROM</codeph> returns <codeph>true</codeph> and <codeph>IS NOT DISTINCT FROM</codeph> returns <codeph>false</codeph>, again instead + of returning <codeph>NULL</codeph> in both cases. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>expression1</varname> IS DISTINCT FROM <varname>expression2</varname> + +<varname>expression1</varname> IS NOT DISTINCT FROM <varname>expression2</varname> +<varname>expression1</varname> <=> <varname>expression2</varname> +</codeblock> + + <p> + The operator <codeph><=></codeph> is an alias for <codeph>IS NOT DISTINCT FROM</codeph>. + It is typically used as a <codeph>NULL</codeph>-safe equality operator in join queries. + That is, <codeph>A <=> B</codeph> is true if <codeph>A</codeph> equals <codeph>B</codeph> + or if both <codeph>A</codeph> and <codeph>B</codeph> are <codeph>NULL</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + This operator provides concise notation for comparing two values and always producing a <codeph>true</codeph> or + <codeph>false</codeph> result, without treating <codeph>NULL</codeph> as a special case. Otherwise, to unambiguously distinguish + between two values requires a compound expression involving <codeph>IS [NOT] NULL</codeph> tests of both operands in addition to the + <codeph>=</codeph> or <codeph>!=</codeph> operator. + </p> + + <p> + The <codeph><=></codeph> operator, used like an equality operator in a join query, + is more efficient than the equivalent clause: <codeph>A = B OR (A IS NULL AND B IS NULL)</codeph>. + The <codeph><=></codeph> operator can use a hash join, while the <codeph>OR</codeph> expression + cannot. + </p> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following examples show how <codeph>IS DISTINCT FROM</codeph> gives output similar to + the <codeph>!=</codeph> operator, and <codeph>IS NOT DISTINCT FROM</codeph> gives output + similar to the <codeph>=</codeph> operator. The exception is when the expression involves + a <codeph>NULL</codeph> value on one side or both sides, where <codeph>!=</codeph> and + <codeph>=</codeph> return <codeph>NULL</codeph> but the <codeph>IS [NOT] DISTINCT FROM</codeph> + operators still return <codeph>true</codeph> or <codeph>false</codeph>. + </p> + +<codeblock> +select 1 is distinct from 0, 1 != 0; ++----------------------+--------+ +| 1 is distinct from 0 | 1 != 0 | ++----------------------+--------+ +| true | true | ++----------------------+--------+ + +select 1 is distinct from 1, 1 != 1; ++----------------------+--------+ +| 1 is distinct from 1 | 1 != 1 | ++----------------------+--------+ +| false | false | ++----------------------+--------+ + +select 1 is distinct from null, 1 != null; ++-------------------------+-----------+ +| 1 is distinct from null | 1 != null | ++-------------------------+-----------+ +| true | NULL | ++-------------------------+-----------+ + +select null is distinct from null, null != null; ++----------------------------+--------------+ +| null is distinct from null | null != null | ++----------------------------+--------------+ +| false | NULL | ++----------------------------+--------------+ + +select 1 is not distinct from 0, 1 = 0; ++--------------------------+-------+ +| 1 is not distinct from 0 | 1 = 0 | ++--------------------------+-------+ +| false | false | ++--------------------------+-------+ + +select 1 is not distinct from 1, 1 = 1; ++--------------------------+-------+ +| 1 is not distinct from 1 | 1 = 1 | ++--------------------------+-------+ +| true | true | ++--------------------------+-------+ + +select 1 is not distinct from null, 1 = null; ++-----------------------------+----------+ +| 1 is not distinct from null | 1 = null | ++-----------------------------+----------+ +| false | NULL | ++-----------------------------+----------+ + +select null is not distinct from null, null = null; ++--------------------------------+-------------+ +| null is not distinct from null | null = null | ++--------------------------------+-------------+ +| true | NULL | ++--------------------------------+-------------+ +</codeblock> + + <p> + The following example shows how <codeph>IS DISTINCT FROM</codeph> considers + <codeph>CHAR</codeph> values to be the same (not distinct from each other) + if they only differ in the number of trailing spaces. Therefore, sometimes + the result of an <codeph>IS [NOT] DISTINCT FROM</codeph> operator differs + depending on whether the values are <codeph>STRING</codeph>/<codeph>VARCHAR</codeph> + or <codeph>CHAR</codeph>. + </p> + +<codeblock> +select + 'x' is distinct from 'x ' as string_with_trailing_spaces, + cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces; ++-----------------------------+---------------------------+ +| string_with_trailing_spaces | char_with_trailing_spaces | ++-----------------------------+---------------------------+ +| true | false | ++-----------------------------+---------------------------+ +</codeblock> </conbody> + </concept> <concept id="is_null"> @@ -850,12 +1168,12 @@ SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','J <p> <indexterm audience="Cloudera">IS NULL operator</indexterm> + <indexterm audience="Cloudera">IS NOT NULL operator</indexterm> - The <codeph>IS NULL</codeph> operator, and its converse the <codeph>IS NOT NULL</codeph> operator, test - whether a specified value is <codeph><xref href="impala_literals.xml#null">NULL</xref></codeph>. Because - using <codeph>NULL</codeph> with any of the other comparison operators such as <codeph>=</codeph> or - <codeph>!=</codeph> also returns <codeph>NULL</codeph> rather than <codeph>TRUE</codeph> or - <codeph>FALSE</codeph>, you use a special-purpose comparison operator to check for this special condition. + The <codeph>IS NULL</codeph> operator, and its converse the <codeph>IS NOT NULL</codeph> operator, test whether a specified value is + <codeph><xref href="impala_literals.xml#null">NULL</xref></codeph>. Because using <codeph>NULL</codeph> with any of the other + comparison operators such as <codeph>=</codeph> or <codeph>!=</codeph> also returns <codeph>NULL</codeph> rather than + <codeph>TRUE</codeph> or <codeph>FALSE</codeph>, you use a special-purpose comparison operator to check for this special condition. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -867,23 +1185,29 @@ SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','J <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> <p> - In many cases, <codeph>NULL</codeph> values indicate some incorrect or incomplete processing during data - ingestion or conversion. You might check whether any values in a column are <codeph>NULL</codeph>, and if - so take some followup action to fill them in. + In many cases, <codeph>NULL</codeph> values indicate some incorrect or incomplete processing during data ingestion or conversion. + You might check whether any values in a column are <codeph>NULL</codeph>, and if so take some followup action to fill them in. </p> <p> - With sparse data, often represented in <q>wide</q> tables, it is common for most values to be - <codeph>NULL</codeph> with only an occasional non-<codeph>NULL</codeph> value. In those cases, you can use - the <codeph>IS NOT NULL</codeph> operator to identify the rows containing any data at all for a particular - column, regardless of the actual value. + With sparse data, often represented in <q>wide</q> tables, it is common for most values to be <codeph>NULL</codeph> with only an + occasional non-<codeph>NULL</codeph> value. In those cases, you can use the <codeph>IS NOT NULL</codeph> operator to identify the + rows containing any data at all for a particular column, regardless of the actual value. </p> <p> - With a well-designed database schema, effective use of <codeph>NULL</codeph> values and <codeph>IS - NULL</codeph> and <codeph>IS NOT NULL</codeph> operators can save having to design custom logic around - special values such as 0, -1, <codeph>'N/A'</codeph>, empty string, and so on. <codeph>NULL</codeph> lets - you distinguish between a value that is known to be 0, false, or empty, and a truly unknown value. + With a well-designed database schema, effective use of <codeph>NULL</codeph> values and <codeph>IS NULL</codeph> and <codeph>IS NOT + NULL</codeph> operators can save having to design custom logic around special values such as 0, -1, <codeph>'N/A'</codeph>, empty + string, and so on. <codeph>NULL</codeph> lets you distinguish between a value that is known to be 0, false, or empty, and a truly + unknown value. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p rev="2.3.0"> + This operator is not applicable to complex type columns (<codeph>STRUCT</codeph>, + <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>). Using a complex type column + with <codeph>IS NULL</codeph> or <codeph>IS NOT NULL</codeph> causes a query error. </p> <p conref="../shared/impala_common.xml#common/example_blurb"/> @@ -898,7 +1222,9 @@ select count(*) from census where household_income is null; -- Sometimes we expect fields to be null, and followup action -- is needed when they are not. select count(*) from web_traffic where weird_http_code is not null;</codeblock> + </conbody> + </concept> <concept id="like"> @@ -909,10 +1235,10 @@ select count(*) from web_traffic where weird_http_code is not null;</codeblock> <p> <indexterm audience="Cloudera">LIKE operator</indexterm> - A comparison operator for <codeph>STRING</codeph> data, with basic wildcard capability using - <codeph>_</codeph> to match a single character and <codeph>%</codeph> to match multiple characters. The - argument expression must match the entire string value. Typically, it is more efficient to put any - <codeph>%</codeph> wildcard match at the end of the string. + A comparison operator for <codeph>STRING</codeph> data, with basic wildcard capability using the underscore + (<codeph>_</codeph>) to match a single character and the percent sign (<codeph>%</codeph>) to match multiple + characters. The argument expression must match the entire string value. + Typically, it is more efficient to put any <codeph>%</codeph> wildcard match at the end of the string. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -938,11 +1264,14 @@ select distinct c_last_name from customer where length(c_last_name) = 4 and subs -- ...or in a more readable way by matching M followed by exactly 3 characters. select distinct c_last_name from customer where c_last_name like 'M___';</codeblock> - <p> - For a more general kind of search operator using regular expressions, see - <xref href="impala_operators.xml#regexp"/>. + <p rev="2.5.0"> + For case-insensitive comparisons, see <xref href="impala_operators.xml#ilike"/>. + For a more general kind of search operator using regular expressions, see <xref href="impala_operators.xml#regexp"/> + or its case-insensitive counterpart <xref href="impala_operators.xml#iregexp"/>. </p> + </conbody> + </concept> <concept id="logical_operators"> @@ -953,9 +1282,9 @@ select distinct c_last_name from customer where c_last_name like 'M___';</codebl <p> <indexterm audience="Cloudera">logical operators</indexterm> - Logical operators return a <codeph>BOOLEAN</codeph> value, based on a binary or unary logical operation - between arguments that are also Booleans. Typically, the argument expressions use - <xref href="impala_operators.xml#comparison_operators">comparison operators</xref>. + Logical operators return a <codeph>BOOLEAN</codeph> value, based on a binary or unary logical operation between arguments that are + also Booleans. Typically, the argument expressions use <xref href="impala_operators.xml#comparison_operators">comparison + operators</xref>. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -970,24 +1299,21 @@ select distinct c_last_name from customer where c_last_name like 'M___';</codebl <ul> <li> - <codeph>AND</codeph>: A binary operator that returns <codeph>true</codeph> if its left-hand and - right-hand arguments both evaluate to <codeph>true</codeph>, <codeph>NULL</codeph> if either argument is - <codeph>NULL</codeph>, and <codeph>false</codeph> otherwise. + <codeph>AND</codeph>: A binary operator that returns <codeph>true</codeph> if its left-hand and right-hand arguments both evaluate + to <codeph>true</codeph>, <codeph>NULL</codeph> if either argument is <codeph>NULL</codeph>, and <codeph>false</codeph> otherwise. </li> <li> - <codeph>OR</codeph>: A binary operator that returns <codeph>true</codeph> if either of its left-hand and - right-hand arguments evaluate to <codeph>true</codeph>, <codeph>NULL</codeph> if one argument is - <codeph>NULL</codeph> and the other is either <codeph>NULL</codeph> or <codeph>false</codeph>, and - <codeph>false</codeph> otherwise. + <codeph>OR</codeph>: A binary operator that returns <codeph>true</codeph> if either of its left-hand and right-hand arguments + evaluate to <codeph>true</codeph>, <codeph>NULL</codeph> if one argument is <codeph>NULL</codeph> and the other is either + <codeph>NULL</codeph> or <codeph>false</codeph>, and <codeph>false</codeph> otherwise. </li> <li> - <codeph>NOT</codeph>: A unary operator that flips the state of a Boolean expression from - <codeph>true</codeph> to <codeph>false</codeph>, or <codeph>false</codeph> to <codeph>true</codeph>. If - the argument expression is <codeph>NULL</codeph>, the result remains <codeph>NULL</codeph>. (When - <codeph>NOT</codeph> is used this way as a unary logical operator, it works differently than the - <codeph>IS NOT NULL</codeph> comparison operator, which returns <codeph>true</codeph> when applied to a + <codeph>NOT</codeph>: A unary operator that flips the state of a Boolean expression from <codeph>true</codeph> to + <codeph>false</codeph>, or <codeph>false</codeph> to <codeph>true</codeph>. If the argument expression is <codeph>NULL</codeph>, + the result remains <codeph>NULL</codeph>. (When <codeph>NOT</codeph> is used this way as a unary logical operator, it works + differently than the <codeph>IS NOT NULL</codeph> comparison operator, which returns <codeph>true</codeph> when applied to a <codeph>NULL</codeph>.) </li> </ul> @@ -997,9 +1323,9 @@ select distinct c_last_name from customer where c_last_name like 'M___';</codebl <p conref="../shared/impala_common.xml#common/complex_types_caveat_no_operator"/> <p rev="2.3.0"> - The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type - that is an item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> - is extracted, it can be used in an arithmetic expression, such as multiplying by 10: + The following example shows how to do an arithmetic operation using a numeric field of a <codeph>STRUCT</codeph> type that is an + item within an <codeph>ARRAY</codeph> column. Once the scalar numeric value <codeph>R_NATIONKEY</codeph> is extracted, it can be + used in an arithmetic expression, such as multiplying by 10: </p> <codeblock rev="2.3.0"> @@ -1165,7 +1491,9 @@ where | false | +-------------+ </codeblock> + </conbody> + </concept> <concept id="regexp"> @@ -1176,40 +1504,34 @@ where <p> <indexterm audience="Cloudera">REGEXP operator</indexterm> - Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where - <codeph>^</codeph> and <codeph>$</codeph> match the beginning and end of the string, <codeph>.</codeph> - represents any single character, <codeph>*</codeph> represents a sequence of zero or more items, - <codeph>+</codeph> represents a sequence of one or more items, <codeph>?</codeph> produces a non-greedy - match, and so on. + Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where <codeph>^</codeph> and + <codeph>$</codeph> match the beginning and end of the string, <codeph>.</codeph> represents any single character, <codeph>*</codeph> + represents a sequence of zero or more items, <codeph>+</codeph> represents a sequence of one or more items, <codeph>?</codeph> + produces a non-greedy match, and so on. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + <codeblock><varname>string_expression</varname> REGEXP <varname>regular_expression</varname> </codeblock> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> - <p> - The regular expression must match the entire value, not just occur somewhere inside it. Use - <codeph>.*</codeph> at the beginning and/or the end if you only need to match characters anywhere in the - middle. Thus, the <codeph>^</codeph> and <codeph>$</codeph> atoms are often redundant, although you might - already have them in your expression strings that you reuse from elsewhere. - </p> + <p conref="../shared/impala_common.xml#common/regular_expression_whole_string"/> <p> The <codeph>RLIKE</codeph> operator is a synonym for <codeph>REGEXP</codeph>. </p> <p> - The <codeph>|</codeph> symbol is the alternation operator, typically used within <codeph>()</codeph> to - match different sequences. The <codeph>()</codeph> groups do not allow backreferences. To retrieve the part - of a value matched within a <codeph>()</codeph> section, use the - <codeph><xref href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph> + The <codeph>|</codeph> symbol is the alternation operator, typically used within <codeph>()</codeph> to match different sequences. + The <codeph>()</codeph> groups do not allow backreferences. To retrieve the part of a value matched within a <codeph>()</codeph> + section, use the <codeph><xref href="impala_string_functions.xml#string_functions/regexp_extract">regexp_extract()</xref></codeph> built-in function. </p> <note rev="1.3.1"> - <p conref="../shared/impala_common.xml#common/regexp_matching"/> + <p rev="1.3.1" conref="../shared/impala_common.xml#common/regexp_matching"/> </note> <p conref="../shared/impala_common.xml#common/regexp_re2"/> @@ -1225,14 +1547,21 @@ where <p conref="../shared/impala_common.xml#common/example_blurb"/> <p> - The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and - <codeph>RLIKE</codeph> operators. + The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators. </p> <!-- Same examples shown for both REGEXP and RLIKE operators. --> <codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p rev="2.5.0"> + For regular expression matching with case-insensitive comparisons, see <xref href="impala_operators.xml#iregexp"/>. + </p> + </conbody> + </concept> <concept id="rlike"> @@ -1243,20 +1572,21 @@ where <p> <indexterm audience="Cloudera">RLIKE operator</indexterm> - Synonym for the <codeph>REGEXP</codeph> operator. See <xref href="impala_operators.xml#regexp"/> for - details. + Synonym for the <codeph>REGEXP</codeph> operator. See <xref href="impala_operators.xml#regexp"/> for details. </p> <p conref="../shared/impala_common.xml#common/example_blurb"/> <p> - The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and - <codeph>RLIKE</codeph> operators. + The following examples demonstrate the identical syntax for the <codeph>REGEXP</codeph> and <codeph>RLIKE</codeph> operators. </p> <!-- Same examples shown for both REGEXP and RLIKE operators. --> <codeblock conref="../shared/impala_common.xml#common/regexp_rlike_examples"/> + </conbody> + </concept> + </concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_order_by.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_order_by.xml b/docs/topics/impala_order_by.xml index f3042e5..0ef49f1 100644 --- a/docs/topics/impala_order_by.xml +++ b/docs/topics/impala_order_by.xml @@ -7,6 +7,8 @@ <data name="Category" value="Impala"/> <data name="Category" value="SQL"/> <data name="Category" value="Querying"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> @@ -33,7 +35,7 @@ In Impala 1.4.0 and higher, the <codeph>LIMIT</codeph> clause is now optional (rather than required) for queries that use the <codeph>ORDER BY</codeph> clause. Impala automatically uses a temporary disk work area to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular - data node. + DataNode. </p> </note> @@ -138,7 +140,7 @@ SELECT id, item.game_name, item.score FROM games2, games2.play WHERE item.score > 1000000 ORDER BY id, item.score DESC; -CREATE TABLE games3 (id BIGINT, play ARRAY < MAP <STRING, BIGINT> >) STORED AS PARQUET; +CREATE TABLE games3 (id BIGINT, play ARRAY < MAP <STRING, BIGINT> >) STORED AS PARQUET; ...use LOAD DATA to load externally created Parquet files into the table... SELECT id, info.key AS k, info.value AS v from games3, games3.play AS plays, games3.play.item AS info WHERE info.KEY = 'score' AND info.VALUE > 1000000 @@ -193,24 +195,24 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content <p> Impala sorts the intermediate results of an <codeph>ORDER BY</codeph> clause in memory whenever practical. In - a cluster of N data nodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying + a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying depending on how the data matching the query is distributed in HDFS. </p> <p> - If the size of the sorted intermediate result set on any data node would cause the query to exceed the Impala + If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This technique is known in industry terminology as <q>external sorting</q> and <q>spilling to disk</q>.) As each 8 MB batch of data is written to disk, Impala frees the corresponding memory to sort a new 8 MB batch of data. When all the data has been processed, a final merge sort operation is performed to correctly order the in-memory and on-disk results as the result set is transmitted back to the coordinator node. When external sorting becomes necessary, Impala requires approximately 60 MB of RAM at a minimum for the buffers needed to - read, write, and sort the intermediate results. If more RAM is available on the data node, Impala will use + read, write, and sort the intermediate results. If more RAM is available on the DataNode, Impala will use the additional RAM to minimize the amount of disk I/O for sorting. </p> <p> - This external sort technique is used as appropriate on each data node (possibly including the coordinator + This external sort technique is used as appropriate on each DataNode (possibly including the coordinator node) to sort the portion of the result set that is processed on that node. When the sorted intermediate results are sent back to the coordinator node to produce the final result set, the coordinator node uses a merge sort technique to produce a final sorted result set without using any extra resources on the @@ -225,7 +227,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content <!-- Here is actually the more logical place to collect all those examples, move them from SELECT and cross-reference to here. --> -<!-- <p rev="obwl" conref="/Content/impala_common_xi44078.xml#common/restrictions_blurb"/> --> +<!-- <p rev="obwl" conref="../shared/impala_common.xml#common/restrictions_blurb"/> --> <p rev="obwl" conref="../shared/impala_common.xml#common/insert_sort_blurb"/> @@ -240,7 +242,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content <li> <p> Now the use of scratch disk space raises the possibility of an <q>out of disk space</q> error on a - particular data node, as opposed to the previous possibility of an <q>out of memory</q> error. Make sure + particular DataNode, as opposed to the previous possibility of an <q>out of memory</q> error. Make sure to keep at least 1 GB free on the filesystem used for temporary sorting work. </p> </li>
