http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_smallint.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_smallint.html b/docs/build/html/topics/impala_smallint.html new file mode 100644 index 0000000..cd48e90 --- /dev/null +++ b/docs/build/html/topics/impala_smallint.html @@ -0,0 +1,125 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="smallint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SMALLINT Data Type</title></head><body id="smallint"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SMALLINT Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + A 2-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> SMALLINT</code></pre> + + <p class="p"> + <strong class="ph b">Range:</strong> -32768 .. 32767. There is no <code class="ph codeph">UNSIGNED</code> subtype. + </p> + + <p class="p"> + <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">INT</code> or + <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>) + automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">STRING</code>, + or <code class="ph codeph">TIMESTAMP</code>. + <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to + <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch + date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. + If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect, + the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span> + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + For a convenient and automated way to check the bounds of the <code class="ph codeph">SMALLINT</code> type, call the + functions <code class="ph codeph">MIN_SMALLINT()</code> and <code class="ph codeph">MAX_SMALLINT()</code>. + </p> + + <p class="p"> + If an integer value is too large to be represented as a <code class="ph codeph">SMALLINT</code>, use an + <code class="ph codeph">INT</code> instead. + </p> + + <p class="p"> + <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code> + value. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>CREATE TABLE t1 (x SMALLINT); +SELECT CAST(1000 AS SMALLINT); +</code></pre> + + <p class="p"> + <strong class="ph b">Parquet considerations:</strong> + </p> + + + + <p class="p"> + Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit + integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a + new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or + <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new + table. + </p> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric + type more efficiently than a <code class="ph codeph">STRING</code> representation of the value. + </p> + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables. + </p> + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + + + <p class="p"> + <strong class="ph b">Internal details:</strong> Represented in memory as a 2-byte value. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> Available in all versions of Impala. + </p> + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size + fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code> + statement. + </p> + + + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, + <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>, + <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>, + <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ssl.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_ssl.html b/docs/build/html/topics/impala_ssl.html new file mode 100644 index 0000000..a9b4d25 --- /dev/null +++ b/docs/build/html/topics/impala_ssl.html @@ -0,0 +1,119 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ssl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring TLS/SSL for Impala</title></head><body id="ssl"><main role="main"><article role="article" aria-labelledby="ssl__tls"> + + <h1 class="title topictitle1" id="ssl__tls">Configuring TLS/SSL for Impala</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Impala supports TLS/SSL network encryption, between Impala and client + programs, and between the Impala-related daemons running on different nodes + in the cluster. This feature is important when you also use other features such as Kerberos + authentication or Sentry authorization, where credentials are being + transmitted back and forth. + </p> + + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ssl__concept_q1p_j2d_rp"> + + <h2 class="title topictitle2" id="ariaid-title2">Using the Command Line</h2> + + <div class="body conbody"> + <p class="p"> + To enable SSL for when client applications connect to Impala, add both of the following flags to the <span class="keyword cmdname">impalad</span> startup options: + </p> + + <ul class="ul" id="concept_q1p_j2d_rp__ul_i2p_m2d_rp"> + <li class="li"> + <code class="ph codeph">--ssl_server_certificate</code>: the full path to the server certificate, on the local filesystem. + </li> + + <li class="li"> + <code class="ph codeph">--ssl_private_key</code>: the full path to the server private key, on the local filesystem. + </li> + </ul> + + <p class="p"> + In <span class="keyword">Impala 2.3</span> and higher, Impala can also use SSL for its own internal communication between the + <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <code class="ph codeph">catalogd</code> daemons. + To enable this additional SSL encryption, set the <code class="ph codeph">--ssl_server_certificate</code> + and <code class="ph codeph">--ssl_private_key</code> flags in the startup options for + <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>, and <span class="keyword cmdname">statestored</span>, + and also add the <code class="ph codeph">--ssl_client_ca_certificate</code> flag for all three of those daemons. + </p> + + <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span> + Prior to <span class="keyword">Impala 2.3.2</span>, you could enable Kerberos authentication between Impala internal components, + or SSL encryption between Impala internal components, but not both at the same time. + This restriction has now been lifted. + See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a> + to see the maintenance releases for different levels of Impala where the fix has been published. + </div> + + <p class="p"> + If either of these flags are set, both must be set. In that case, Impala starts listening for Beeswax and HiveServer2 requests on + SSL-secured ports only. (The port numbers stay the same; see <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for details.) + </p> + + <p class="p"> + Since Impala uses passphrase-less certificates in PEM format, you can reuse a host's existing Java keystore + by using the <code class="ph codeph">openssl</code> toolkit to convert it to the PEM format. + </p> + + <section class="section" id="concept_q1p_j2d_rp__secref"><h3 class="title sectiontitle">Configuring TLS/SSL Communication for the Impala Shell</h3> + + + + <p class="p"> + With SSL enabled for Impala, use the following options when starting the + <span class="keyword cmdname">impala-shell</span> interpreter: + </p> + + <ul class="ul" id="concept_q1p_j2d_rp__ul_kgp_m2d_rp"> + <li class="li"> + <code class="ph codeph">--ssl</code>: enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>. + </li> + + <li class="li"> + <code class="ph codeph">--ca_cert</code>: the local pathname pointing to the third-party CA certificate, or to a copy of the server + certificate for self-signed server certificates. + </li> + </ul> + + <p class="p"> + If <code class="ph codeph">--ca_cert</code> is not set, <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server + certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the + certificate is not available (such as when debugging customer installations). + </p> + + </section> + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ssl__ssl_jdbc_odbc"> + <h2 class="title topictitle2" id="ariaid-title3">Using TLS/SSL with Business Intelligence Tools</h2> + <div class="body conbody"> + <p class="p"> + You can use Kerberos authentication, TLS/SSL encryption, or both to secure + connections from JDBC and ODBC applications to Impala. + See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a> + for details. + </p> + + <p class="p"> + Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication + and SSL encryption. If your cluster is running an older release that has this restriction, + use an alternative JDBC driver that supports + both of these security features. + </p> + </div> + </article> + +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_stddev.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_stddev.html b/docs/build/html/topics/impala_stddev.html new file mode 100644 index 0000000..4a14e14 --- /dev/null +++ b/docs/build/html/topics/impala_stddev.html @@ -0,0 +1,121 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="stddev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title></head><body id="stddev"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</h1> + + + + <div class="body conbody"> + + <p class="p"> + + + + An aggregate function that + <a class="xref" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank">standard + deviation</a> of a set of numbers. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre> + + <p class="p"> + This function works with any numeric data type. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier + releases + </p> + + <p class="p"> + This function is typically used in mathematical formulas related to probability distributions. + </p> + + <p class="p"> + The <code class="ph codeph">STDDEV_POP()</code> and <code class="ph codeph">STDDEV_SAMP()</code> functions compute the population + standard deviation and sample standard deviation, respectively, of the input values. + (<code class="ph codeph">STDDEV()</code> is an alias for <code class="ph codeph">STDDEV_SAMP()</code>.) Both functions evaluate all input + rows matched by the query. The difference is that <code class="ph codeph">STDDEV_SAMP()</code> is scaled by + <code class="ph codeph">1/(N-1)</code> while <code class="ph codeph">STDDEV_POP()</code> is scaled by <code class="ph codeph">1/N</code>. + </p> + + <p class="p"> + If no input rows match the query, the result of any of these functions is <code class="ph codeph">NULL</code>. If a single + input row matches the query, the result of any of these functions is <code class="ph codeph">"0.0"</code>. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + This example demonstrates how <code class="ph codeph">STDDEV()</code> and <code class="ph codeph">STDDEV_SAMP()</code> return the same + result, while <code class="ph codeph">STDDEV_POP()</code> uses a slightly different calculation to reflect that the input + data is considered part of a larger <span class="q">"population"</span>. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select stddev(score) from test_scores; ++---------------+ +| stddev(score) | ++---------------+ +| 28.5 | ++---------------+ +[localhost:21000] > select stddev_samp(score) from test_scores; ++--------------------+ +| stddev_samp(score) | ++--------------------+ +| 28.5 | ++--------------------+ +[localhost:21000] > select stddev_pop(score) from test_scores; ++-------------------+ +| stddev_pop(score) | ++-------------------+ +| 28.4858 | ++-------------------+ +</code></pre> + + <p class="p"> + This example demonstrates that, because the return value of these aggregate functions is a + <code class="ph codeph">STRING</code>, you must currently convert the result with <code class="ph codeph">CAST</code>. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores; ++-------------------+ +| summary | ++-------------------+ +| Inserted 1 row(s) | ++-------------------+ +[localhost:21000] > desc score_stats; ++--------------------+--------------+---------+ +| name | type | comment | ++--------------------+--------------+---------+ +| standard_deviation | decimal(7,4) | | +| variance | decimal(7,4) | | ++--------------------+--------------+---------+ +</code></pre> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <p class="p"> + This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">STDDEV()</code>, <code class="ph codeph">STDDEV_POP()</code>, and <code class="ph codeph">STDDEV_SAMP()</code> functions + compute the standard deviation (square root of the variance) based on the results of + <code class="ph codeph">VARIANCE()</code>, <code class="ph codeph">VARIANCE_POP()</code>, and <code class="ph codeph">VARIANCE_SAMP()</code> + respectively. See <a class="xref" href="impala_variance.html#variance">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a> for details about the variance property. + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_string.html b/docs/build/html/topics/impala_string.html new file mode 100644 index 0000000..60714c7 --- /dev/null +++ b/docs/build/html/topics/impala_string.html @@ -0,0 +1,197 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRING Data Type</title></head><body id="string"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">STRING Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRING</code></pre> + + <p class="p"> + <strong class="ph b">Length:</strong> Maximum of 32,767 bytes. Do not use any length constraint when declaring + <code class="ph codeph">STRING</code> columns, as you might be familiar with from <code class="ph codeph">VARCHAR</code>, + <code class="ph codeph">CHAR</code>, or similar column types from relational database systems. <span class="ph">If you do + need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare + columns as <code class="ph codeph">VARCHAR(<var class="keyword varname">max_length</var>)</code> or + <code class="ph codeph">CHAR(<var class="keyword varname">length</var>)</code>, but for best performance use <code class="ph codeph">STRING</code> + where practical.</span> + </p> + + <p class="p"> + <strong class="ph b">Character sets:</strong> For full support in all Impala subsystems, restrict string values to the ASCII + character set. Although some UTF-8 character data can be stored in Impala and retrieved through queries, UTF-8 strings + containing non-ASCII characters are not guaranteed to work properly in combination with many SQL aspects, + including but not limited to: + </p> + <ul class="ul"> + <li class="li"> + String manipulation functions. + </li> + <li class="li"> + Comparison operators. + </li> + <li class="li"> + The <code class="ph codeph">ORDER BY</code> clause. + </li> + <li class="li"> + Values in partition key columns. + </li> + </ul> + + <p class="p"> + For any national language aspects such as + collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala + does not include such metadata with the table definition. If you need to sort, manipulate, or display data + depending on those national language characteristics of string data, use logic on the application side. + </p> + + <p class="p"> + <strong class="ph b">Conversions:</strong> + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Impala does not automatically convert <code class="ph codeph">STRING</code> to any numeric type. Impala does + automatically convert <code class="ph codeph">STRING</code> to <code class="ph codeph">TIMESTAMP</code> if the value matches one of + the accepted <code class="ph codeph">TIMESTAMP</code> formats; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for + details. + </p> + </li> + + <li class="li"> + <p class="p"> + You can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">STRING</code> values to + <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, + <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">TIMESTAMP</code>. + </p> + </li> + + <li class="li"> + <p class="p"> + You cannot directly cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>. You can use a + <code class="ph codeph">CASE</code> expression to evaluate string values such as <code class="ph codeph">'T'</code>, + <code class="ph codeph">'true'</code>, and so on and return Boolean <code class="ph codeph">true</code> and <code class="ph codeph">false</code> + values as appropriate. + </p> + </li> + + <li class="li"> + <p class="p"> + You can cast a <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code> + for <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> + </p> + + <p class="p"> + Although it might be convenient to use <code class="ph codeph">STRING</code> columns for partition keys, even when those + columns contain numbers, for performance and scalability it is much better to use numeric columns as + partition keys whenever practical. Although the underlying HDFS directory name might be the same in either + case, the in-memory storage for the partition key columns is more compact, and computations are faster, if + partition key columns such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, <code class="ph codeph">DAY</code> and so on + are declared as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and so on. + </p> + + <p class="p"> + <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP + BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space + to all be different values. + </p> + + + + + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + <p class="p"><strong class="ph b">Avro considerations:</strong></p> + <p class="p"> + The Avro specification allows string values up to 2**64 bytes in length. + Impala queries for Avro tables use 32-bit integers to hold string lengths. + In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code> + and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes. + If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1 + bytes in an Avro table, the query fails. In earlier releases, + encountering such long values in an Avro table could cause a crash. + </p> + + + + + + + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because the values of this type have variable size, none of the + column statistics fields are filled in until you run the <code class="ph codeph">COMPUTE STATS</code> statement. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following examples demonstrate double-quoted and single-quoted string literals, and required escaping for + quotation marks within string literals: + </p> + +<pre class="pre codeblock"><code>SELECT 'I am a single-quoted string'; +SELECT "I am a double-quoted string"; +SELECT 'I\'m a single-quoted string with an apostrophe'; +SELECT "I\'m a double-quoted string with an apostrophe"; +SELECT 'I am a "short" single-quoted string containing quotes'; +SELECT "I am a \"short\" double-quoted string containing quotes"; +</code></pre> + + <p class="p"> + The following examples demonstrate calls to string manipulation functions to concatenate strings, convert + numbers to strings, or pull out substrings: + </p> + +<pre class="pre codeblock"><code>SELECT CONCAT("Once upon a time, there were ", CAST(3 AS STRING), ' little pigs.'); +SELECT SUBSTR("hello world",7,5); +</code></pre> + + <p class="p"> + The following examples show how to perform operations on <code class="ph codeph">STRING</code> columns within a table: + </p> + +<pre class="pre codeblock"><code>CREATE TABLE t1 (s1 STRING, s2 STRING); +INSERT INTO t1 VALUES ("hello", 'world'), (CAST(7 AS STRING), "wonders"); +SELECT s1, s2, length(s1) FROM t1 WHERE s2 LIKE 'w%'; +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_literals.html#string_literals">String Literals</a>, <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>, + <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>, + <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_string_functions.html b/docs/build/html/topics/impala_string_functions.html new file mode 100644 index 0000000..aab1f35 --- /dev/null +++ b/docs/build/html/topics/impala_string_functions.html @@ -0,0 +1,1036 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala String Functions</title></head><body id="string_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala String Functions</h1> + + + + <div class="body conbody"> + + <div class="p"> + String functions are classified as those primarily accepting or returning <code class="ph codeph">STRING</code>, + <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> data types, for example to measure the length of a string + or concatenate two strings together. + <ul class="ul"> + <li class="li"> + All the functions that accept <code class="ph codeph">STRING</code> arguments also accept the <code class="ph codeph">VARCHAR</code> + and <code class="ph codeph">CHAR</code> types introduced in Impala 2.0. + </li> + + <li class="li"> + Whenever <code class="ph codeph">VARCHAR</code> or <code class="ph codeph">CHAR</code> values are passed to a function that returns a + string value, the return type is normalized to <code class="ph codeph">STRING</code>. For example, a call to + <code class="ph codeph">concat()</code> with a mix of <code class="ph codeph">STRING</code>, <code class="ph codeph">VARCHAR</code>, and + <code class="ph codeph">CHAR</code> arguments produces a <code class="ph codeph">STRING</code> result. + </li> + </ul> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + The string functions operate mainly on these data types: <a class="xref" href="impala_string.html#string">STRING Data Type</a>, + <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, and <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>. + </p> + + <p class="p"> + <strong class="ph b">Function reference:</strong> + </p> + + <p class="p"> + Impala supports the following string functions: + </p> + + <dl class="dl"> + + + <dt class="dt dlterm" id="string_functions__ascii"> + <code class="ph codeph">ascii(string str)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the numeric ASCII code of the first character of the argument. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__btrim"> + <code class="ph codeph">btrim(string a)</code>, + <code class="ph codeph">btrim(string a, string chars_to_trim)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Removes all instances of one or more characters + from the start and end of a <code class="ph codeph">STRING</code> value. + By default, removes only spaces. + If a non-<code class="ph codeph">NULL</code> optional second argument is specified, the function removes all + occurrences of characters in that second argument from the beginning and + end of the string. + <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show the default <code class="ph codeph">btrim()</code> behavior, + and what changes when you specify the optional second argument. + All the examples bracket the output value with <code class="ph codeph">[ ]</code> + so that you can see any leading or trailing spaces in the <code class="ph codeph">btrim()</code> result. + By default, the function removes and number of both leading and trailing spaces. + When the second argument is specified, any number of occurrences of any + character in the second argument are removed from the start and end of the + input string; in this case, spaces are not removed (unless they are part of the second + argument) and any instances of the characters are not removed if they do not come + right at the beginning or end of the string. + </p> +<pre class="pre codeblock"><code>-- Remove multiple spaces before and one space after. +select concat('[',btrim(' hello '),']'); ++---------------------------------------+ +| concat('[', btrim(' hello '), ']') | ++---------------------------------------+ +| [hello] | ++---------------------------------------+ + +-- Remove any instances of x or y or z at beginning or end. Leave spaces alone. +select concat('[',btrim('xy hello zyzzxx','xyz'),']'); ++------------------------------------------------------+ +| concat('[', btrim('xy hello zyzzxx', 'xyz'), ']') | ++------------------------------------------------------+ +| [ hello ] | ++------------------------------------------------------+ + +-- Remove any instances of x or y or z at beginning or end. +-- Leave x, y, z alone in the middle of the string. +select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']'); ++----------------------------------------------------+ +| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') | ++----------------------------------------------------+ +| [helxyzlo] | ++----------------------------------------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__char_length"> + <code class="ph codeph">char_length(string a), <span class="ph" id="string_functions__character_length">character_length(string a)</span></code> + </dt> + + <dd class="dd"> + + + <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string. Aliases for the + <code class="ph codeph">length()</code> function. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__chr"> + <code class="ph codeph">chr(int character_code)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a character specified by a decimal code point value. + The interpretation and display of the resulting character depends on your system locale. + Because consistent processing of Impala string values is only guaranteed + for values within the ASCII range, only use this function for values + corresponding to ASCII characters. + In particular, parameter values greater than 255 return an empty string. + <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> Can be used as the inverse of the <code class="ph codeph">ascii()</code> function, which + converts a character to its numeric ASCII code. + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>SELECT chr(65); ++---------+ +| chr(65) | ++---------+ +| A | ++---------+ + +SELECT chr(97); ++---------+ +| chr(97) | ++---------+ +| a | ++---------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__concat"> + <code class="ph codeph">concat(string a, string b...)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a single string representing all the argument values joined together. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for + concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code> + joins together values from different rows. + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__concat_ws"> + <code class="ph codeph">concat_ws(string sep, string a, string b...)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a single string representing the second and following argument values joined + together, delimited by a specified separator. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for + concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code> + joins together values from different rows. + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__find_in_set"> + <code class="ph codeph">find_in_set(string str, string strList)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a specified string + within a comma-separated string. Returns <code class="ph codeph">NULL</code> if either argument is + <code class="ph codeph">NULL</code>, 0 if the search string is not found, or 0 if the search string contains a comma. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__group_concat"> + <code class="ph codeph">group_concat(string s [, string sep])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a single string representing the argument value concatenated together for each + row of the result set. If the optional separator string is specified, the separator is added between each + pair of concatenated values. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for + concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code> + joins together values from different rows. + </p> + <p class="p"> + By default, returns a single string covering the whole result set. To include other columns or values + in the result set, or to produce multiple concatenated strings for subsets of rows, include a + <code class="ph codeph">GROUP BY</code> clause in the query. + </p> + <p class="p"> + Strictly speaking, <code class="ph codeph">group_concat()</code> is an aggregate function, not a scalar + function like the others in this list. + For additional details and examples, see <a class="xref" href="impala_group_concat.html#group_concat">GROUP_CONCAT Function</a>. + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__initcap"> + <code class="ph codeph">initcap(string str)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the input string with the first letter capitalized. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__instr"> + <code class="ph codeph">instr(string str, string substr)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a + longer string. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__length"> + <code class="ph codeph">length(string a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__locate"> + <code class="ph codeph">locate(string substr, string str[, int pos])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a + longer string, optionally after a particular position. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__lower"> + <code class="ph codeph">lower(string a), <span class="ph" id="string_functions__lcase">lcase(string a)</span> </code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-lowercase. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that + use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls + to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code> + or <code class="ph codeph">IREGEXP</code> operators instead. See + <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and + <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details. + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__lpad"> + <code class="ph codeph">lpad(string str, int len, string pad)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the + specified string is too short, it is padded on the left with a repeating sequence of the characters from + the pad string. If the specified string is too long, it is truncated on the right. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__ltrim"> + <code class="ph codeph">ltrim(string a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the argument string with any leading spaces removed from the left side. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__parse_url"> + <code class="ph codeph">parse_url(string urlString, string partToExtract [, string keyToExtract])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the portion of a URL corresponding to a specified part. The part argument can be + <code class="ph codeph">'PROTOCOL'</code>, <code class="ph codeph">'HOST'</code>, <code class="ph codeph">'PATH'</code>, <code class="ph codeph">'REF'</code>, + <code class="ph codeph">'AUTHORITY'</code>, <code class="ph codeph">'FILE'</code>, <code class="ph codeph">'USERINFO'</code>, or + <code class="ph codeph">'QUERY'</code>. Uppercase is required for these literal values. When requesting the + <code class="ph codeph">QUERY</code> portion of the URL, you can optionally specify a key to retrieve just the + associated value from the key-value pairs in the query string. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> This function is important for the traditional Hadoop use case of interpreting web + logs. For example, if the web traffic data features raw URLs not divided into separate table columns, + you can count visitors to a particular page by extracting the <code class="ph codeph">'PATH'</code> or + <code class="ph codeph">'FILE'</code> field, or analyze search terms by extracting the corresponding key from the + <code class="ph codeph">'QUERY'</code> field. + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__regexp_extract"> + <code class="ph codeph">regexp_extract(string subject, string pattern, int index)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the specified () group from a string based on a regular expression pattern. Group + 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so + on <code class="ph codeph">(...)</code> portion. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular + Expression syntax used by the Google RE2 library. For details, see + <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It + has most idioms familiar from regular expressions in Perl, Python, and so on, including + <code class="ph codeph">.*?</code> for non-greedy matches. + </p> + <p class="p"> + In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the + way regular expressions are interpreted by this function. Test any queries that use regular expressions and + adjust the expression patterns if necessary. See + <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details. + </p> + <p class="p"> + Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping, + use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions + that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character + class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to + escape as <code class="ph codeph">\\d</code>. + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + This example shows how group 0 matches the full pattern string, including the portion outside any + <code class="ph codeph">()</code> group: + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0); ++------------------------------------------------------+ +| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) | ++------------------------------------------------------+ +| abcdef123ghi456 | ++------------------------------------------------------+ +Returned 1 row(s) in 0.11s</code></pre> + <p class="p"> + This example shows how group 1 matches just the contents inside the first <code class="ph codeph">()</code> group in + the pattern string: + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1); ++------------------------------------------------------+ +| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) | ++------------------------------------------------------+ +| 456 | ++------------------------------------------------------+ +Returned 1 row(s) in 0.11s</code></pre> + <p class="p"> + Unlike in earlier Impala releases, the regular expression library used in Impala 2.0 and later supports + the <code class="ph codeph">.*?</code> idiom for non-greedy matches. This example shows how a pattern string starting + with <code class="ph codeph">.*?</code> matches the shortest possible portion of the source string, returning the + rightmost set of lowercase letters. A pattern string both starting and ending with <code class="ph codeph">.*?</code> + finds two potential matches of equal length, and returns the first one found (the leftmost set of + lowercase letters). + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1); ++--------------------------------------------------------+ +| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) | ++--------------------------------------------------------+ +| def | ++--------------------------------------------------------+ +[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1); ++-----------------------------------------------------------+ +| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) | ++-----------------------------------------------------------+ +| bcd | ++-----------------------------------------------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__regexp_like"> + <code class="ph codeph">regexp_like(string source, string pattern[, string options])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">true</code> or <code class="ph codeph">false</code> to indicate + whether the source string contains anywhere inside it the regular expression given by the pattern. + The optional third argument consists of letter flags that change how the match is performed, + such as <code class="ph codeph">i</code> for case-insensitive matching. + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + <p class="p"> + The flags that you can include in the optional third argument are: + </p> + <ul class="ul"> + <li class="li"> + <code class="ph codeph">c</code>: Case-sensitive matching (the default). + </li> + <li class="li"> + <code class="ph codeph">i</code>: Case-insensitive matching. If multiple instances of <code class="ph codeph">c</code> and <code class="ph codeph">i</code> + are included in the third argument, the last such option takes precedence. + </li> + <li class="li"> + <code class="ph codeph">m</code>: Multi-line matching. The <code class="ph codeph">^</code> and <code class="ph codeph">$</code> + operators match the start or end of any line within the source string, not the + start and end of the entire string. + </li> + <li class="li"> + <code class="ph codeph">n</code>: Newline matching. The <code class="ph codeph">.</code> operator can match the + newline character. A repetition operator such as <code class="ph codeph">.*</code> can + match a portion of the source string that spans multiple lines. + </li> + </ul> + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code> + </p> + <p class="p"> + In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular + Expression syntax used by the Google RE2 library. For details, see + <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It + has most idioms familiar from regular expressions in Perl, Python, and so on, including + <code class="ph codeph">.*?</code> for non-greedy matches. + </p> + <p class="p"> + In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the + way regular expressions are interpreted by this function. Test any queries that use regular expressions and + adjust the expression patterns if necessary. See + <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details. + </p> + <p class="p"> + Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping, + use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions + that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character + class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to + escape as <code class="ph codeph">\\d</code>. + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + This example shows how <code class="ph codeph">regexp_like()</code> can test for the existence + of various kinds of regular expression patterns within a source string: + </p> +<pre class="pre codeblock"><code> +-- Matches because the 'f' appears somewhere in 'foo'. +select regexp_like('foo','f'); ++-------------------------+ +| regexp_like('foo', 'f') | ++-------------------------+ +| true | ++-------------------------+ + +-- Does not match because the comparison is case-sensitive by default. +select regexp_like('foo','F'); ++-------------------------+ +| regexp_like('foo', 'f') | ++-------------------------+ +| false | ++-------------------------+ + +-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive. +select regexp_like('foo','F','i'); ++------------------------------+ +| regexp_like('foo', 'f', 'i') | ++------------------------------+ +| true | ++------------------------------+ + +-- The familiar regular expression notations work, such as ^ and $ anchors... +select regexp_like('foo','f$'); ++--------------------------+ +| regexp_like('foo', 'f$') | ++--------------------------+ +| false | ++--------------------------+ + +select regexp_like('foo','o$'); ++--------------------------+ +| regexp_like('foo', 'o$') | ++--------------------------+ +| true | ++--------------------------+ + +-- ...and repetition operators such as * and + +select regexp_like('foooooobar','fo+b'); ++-----------------------------------+ +| regexp_like('foooooobar', 'fo+b') | ++-----------------------------------+ +| true | ++-----------------------------------+ + +select regexp_like('foooooobar','fx*y*o*b'); ++---------------------------------------+ +| regexp_like('foooooobar', 'fx*y*o*b') | ++---------------------------------------+ +| true | ++---------------------------------------+ + +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__regexp_replace"> + <code class="ph codeph">regexp_replace(string initial, string pattern, string replacement)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the initial argument with the regular expression pattern replaced by the final + argument string. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular + Expression syntax used by the Google RE2 library. For details, see + <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It + has most idioms familiar from regular expressions in Perl, Python, and so on, including + <code class="ph codeph">.*?</code> for non-greedy matches. + </p> + <p class="p"> + In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the + way regular expressions are interpreted by this function. Test any queries that use regular expressions and + adjust the expression patterns if necessary. See + <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details. + </p> + <p class="p"> + Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping, + use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions + that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character + class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to + escape as <code class="ph codeph">\\d</code>. + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + These examples show how you can replace parts of a string matching a pattern with replacement text, + which can include backreferences to any <code class="ph codeph">()</code> groups in the pattern string. The + backreference numbers start at 1, and any <code class="ph codeph">\</code> characters must be escaped as + <code class="ph codeph">\\</code>. + </p> + <p class="p"> + Replace a character pattern with new text: + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','b+','xyz'); ++------------------------------------------+ +| regexp_replace('aaabbbaaa', 'b+', 'xyz') | ++------------------------------------------+ +| aaaxyzaaa | ++------------------------------------------+ +Returned 1 row(s) in 0.11s</code></pre> + <p class="p"> + Replace a character pattern with substitution text that includes the original matching text: + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','(b+)','<\\1>'); ++----------------------------------------------+ +| regexp_replace('aaabbbaaa', '(b+)', '<\\1>') | ++----------------------------------------------+ +| aaa<bbb>aaa | ++----------------------------------------------+ +Returned 1 row(s) in 0.11s</code></pre> + <p class="p"> + Remove all characters that are not digits: + </p> +<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('123-456-789','[^[:digit:]]',''); ++---------------------------------------------------+ +| regexp_replace('123-456-789', '[^[:digit:]]', '') | ++---------------------------------------------------+ +| 123456789 | ++---------------------------------------------------+ +Returned 1 row(s) in 0.12s</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__repeat"> + <code class="ph codeph">repeat(string str, int n)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the argument string repeated a specified number of times. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__reverse"> + <code class="ph codeph">reverse(string a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the argument string with characters in reversed order. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__rpad"> + <code class="ph codeph">rpad(string str, int len, string pad)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the + specified string is too short, it is padded on the right with a repeating sequence of the characters from + the pad string. If the specified string is too long, it is truncated on the right. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__rtrim"> + <code class="ph codeph">rtrim(string a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the argument string with any trailing spaces removed from the right side. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__space"> + <code class="ph codeph">space(int n)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a concatenated string of the specified number of spaces. Shorthand for + <code class="ph codeph">repeat(' ',<var class="keyword varname">n</var>)</code>. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__split_part"> + <code class="ph codeph">split_part(string source, string delimiter, bigint n)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the nth field within a delimited string. + The fields are numbered starting from 1. + The delimiter can consist of multiple characters, not just a + single character. All matching of the delimiter is done exactly, not using any + regular expression patterns. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular + Expression syntax used by the Google RE2 library. For details, see + <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It + has most idioms familiar from regular expressions in Perl, Python, and so on, including + <code class="ph codeph">.*?</code> for non-greedy matches. + </p> + <p class="p"> + In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the + way regular expressions are interpreted by this function. Test any queries that use regular expressions and + adjust the expression patterns if necessary. See + <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details. + </p> + <p class="p"> + Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping, + use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions + that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character + class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to + escape as <code class="ph codeph">\\d</code>. + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + These examples show how to retrieve the nth field from a delimited string: + </p> +<pre class="pre codeblock"><code> +select split_part('x,y,z',',',1); ++-----------------------------+ +| split_part('x,y,z', ',', 1) | ++-----------------------------+ +| x | ++-----------------------------+ + +select split_part('x,y,z',',',2); ++-----------------------------+ +| split_part('x,y,z', ',', 2) | ++-----------------------------+ +| y | ++-----------------------------+ + +select split_part('x,y,z',',',3); ++-----------------------------+ +| split_part('x,y,z', ',', 3) | ++-----------------------------+ +| z | ++-----------------------------+ +</codeblock> + + <p> + These examples show what happens for out-of-range field positions. + Specifying a value less than 1 produces an error. Specifying a value + greater than the number of fields returns a zero-length string + (which is not the same as <codeph>NULL</codeph>). + </p> +<codeblock><![CDATA[ +select split_part('x,y,z',',',0); +ERROR: Invalid field position: 0 + +with t1 as (select split_part('x,y,z',',',4) nonexistent_field) + select + nonexistent_field + , concat('[',nonexistent_field,']') + , length(nonexistent_field); +from t1 ++-------------------+-------------------------------------+---------------------------+ +| nonexistent_field | concat('[', nonexistent_field, ']') | length(nonexistent_field) | ++-------------------+-------------------------------------+---------------------------+ +| | [] | 0 | ++-------------------+-------------------------------------+---------------------------+ +</codeblock> + + <p> + These examples show how the delimiter can be a multi-character value: + </p> +<codeblock><![CDATA[ +select split_part('one***two***three','***',2); ++-------------------------------------------+ +| split_part('one***two***three', '***', 2) | ++-------------------------------------------+ +| two | ++-------------------------------------------+ + +select split_part('one\|/two\|/three','\|/',3); ++-------------------------------------------+ +| split_part('one\|/two\|/three', '\|/', 3) | ++-------------------------------------------+ +| three | ++-------------------------------------------+ + +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__strleft"> + <code class="ph codeph">strleft(string a, int num_chars)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the leftmost characters of the string. Shorthand for a call to + <code class="ph codeph">substr()</code> with 2 arguments. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__strright"> + <code class="ph codeph">strright(string a, int num_chars)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the rightmost characters of the string. Shorthand for a call to + <code class="ph codeph">substr()</code> with 2 arguments. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__substr"> + <code class="ph codeph">substr(string a, int start [, int len]), <span class="ph" id="string_functions__substring">substring(string a, int start [, int + len])</span></code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the portion of the string starting at a specified point, optionally with a + specified maximum length. The characters in the string are indexed starting at 1. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__translate"> + <code class="ph codeph">translate(string input, string from, string to)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the input string with a set of characters replaced by another set of characters. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__trim"> + <code class="ph codeph">trim(string a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the input string with both leading and trailing spaces removed. The same as + passing the string through both <code class="ph codeph">ltrim()</code> and <code class="ph codeph">rtrim()</code>. + <p class="p"> + <strong class="ph b">Usage notes:</strong> Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces. + For a more general-purpose function that can remove other leading and trailing characters besides spaces, see <code class="ph codeph">btrim()</code>. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="string_functions__upper"> + <code class="ph codeph">upper(string a), <span class="ph" id="string_functions__ucase">ucase(string a)</span></code> + </dt> + + <dd class="dd"> + + + <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-uppercase. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that + use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls + to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code> + or <code class="ph codeph">IREGEXP</code> operators instead. See + <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and + <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details. + </p> + </dd> + + + </dl> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file
