http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mem_limit.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_mem_limit.html b/docs/build/html/topics/impala_mem_limit.html new file mode 100644 index 0000000..704dae9 --- /dev/null +++ b/docs/build/html/topics/impala_mem_limit.html @@ -0,0 +1,206 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node. + Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes. + </p> + + <p class="p"> + There are two levels of memory limit for Impala. + The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process + (which handles multiple queries concurrently). + That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>. + The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span> + or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query. + The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>, + and must always be less than the <span class="keyword cmdname">impalad</span> memory limit. + </p> + + <p class="p"> + If query processing exceeds the specified memory limit on any node, either the per-query limit or the + <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically. + Memory limits are checked periodically during query processing, so the actual memory in use + might briefly exceed the limit without the query being cancelled. + </p> + + <p class="p"> + When resource management is enabled, the mechanism for this option changes. If set, it overrides the + automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the + query does not proceed until that much memory is available. The actual memory used by the query could be + lower, since some queries use much less memory than others. With resource management, the + <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any + node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query + is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is + specified, Impala estimates the amount of memory needed on each node for each query, requests that much + memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each + node to the requested amount of memory during the query. Thus, if the query takes more memory than was + originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query + itself. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> numeric + </p> + + <p class="p"> + <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code> + for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized + formats, subsequent queries fail with an error. + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> 0 (unlimited) + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting, + or on a cluster with a workload shared between Impala and other data processing components. + You can prevent any query from accidentally using much more memory than expected, + which could negatively impact other Impala queries. + </p> + + <p class="p"> + Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span> + to get a report of memory used for each phase of your most heavyweight queries on each node, + and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that. + See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about + the <code class="ph codeph">SUMMARY</code> command. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option + using a fixed number of bytes, or suffixes representing gigabytes or megabytes. + </p> + +<pre class="pre codeblock"><code> +[localhost:21000] > set mem_limit=3000000000; +MEM_LIMIT set to 3000000000 +[localhost:21000] > select 5; +Query: select 5 ++---+ +| 5 | ++---+ +| 5 | ++---+ + +[localhost:21000] > set mem_limit=3g; +MEM_LIMIT set to 3g +[localhost:21000] > select 5; +Query: select 5 ++---+ +| 5 | ++---+ +| 5 | ++---+ + +[localhost:21000] > set mem_limit=3gb; +MEM_LIMIT set to 3gb +[localhost:21000] > select 5; ++---+ +| 5 | ++---+ +| 5 | ++---+ + +[localhost:21000] > set mem_limit=3m; +MEM_LIMIT set to 3m +[localhost:21000] > select 5; ++---+ +| 5 | ++---+ +| 5 | ++---+ +[localhost:21000] > set mem_limit=3mb; +MEM_LIMIT set to 3mb +[localhost:21000] > select 5; ++---+ +| 5 | ++---+ +</code></pre> + + <p class="p"> + The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code> + values lead to errors for subsequent queries. + </p> + +<pre class="pre codeblock"><code> +[localhost:21000] > set mem_limit=3tb; +MEM_LIMIT set to 3tb +[localhost:21000] > select 5; +ERROR: Failed to parse query memory limit from '3tb'. + +[localhost:21000] > set mem_limit=xyz; +MEM_LIMIT set to xyz +[localhost:21000] > select 5; +Query: select 5 +ERROR: Failed to parse query memory limit from 'xyz'. +</code></pre> + + <p class="p"> + The following examples shows the automatic query cancellation + when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded + on any host involved in the Impala query. First it runs a + successful query and checks the largest amount of memory + used on any node for any stage of the query. + Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code> + setting so that the same query cannot run. + </p> + +<pre class="pre codeblock"><code> +[localhost:21000] > select count(*) from customer; +Query: select count(*) from customer ++----------+ +| count(*) | ++----------+ +| 150000 | ++----------+ + +[localhost:21000] > select count(distinct c_name) from customer; +Query: select count(distinct c_name) from customer ++------------------------+ +| count(distinct c_name) | ++------------------------+ +| 150000 | ++------------------------+ + +[localhost:21000] > summary; ++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ +| 06:AGGREGATE | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE | +| 05:EXCHANGE | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED | +| 02:AGGREGATE | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | | +| 04:AGGREGATE | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | | +| 03:EXCHANGE | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) | +<strong class="ph b">| 01:AGGREGATE | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |</strong> +| 00:SCAN HDFS | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer | ++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ + +[localhost:21000] > set mem_limit=15mb; +MEM_LIMIT set to 15mb +[localhost:21000] > select count(distinct c_name) from customer; +Query: select count(distinct c_name) from customer +ERROR: +Memory limit exceeded +Query did not have enough memory to get the minimum required buffers in the block manager. +</code></pre> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_min.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_min.html b/docs/build/html/topics/impala_min.html new file mode 100644 index 0000000..88a39ab --- /dev/null +++ b/docs/build/html/topics/impala_min.html @@ -0,0 +1,297 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1> + + + + <div class="body conbody"> + + <p class="p"> + + An aggregate function that returns the minimum value from a set of numbers. Opposite of the + <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function + or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column + are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are + <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre> + + <p class="p"> + When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of + grouping values. + </p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause. + For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start + bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> + arguments which produce a <code class="ph codeph">STRING</code> result + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and + <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> + query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher. + See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> + for the kinds of queries that this option applies to, and slight differences in how partitions are + evaluated when this query option is enabled. + </p> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>) + in an aggregation function, you unpack the individual elements using join notation in the query, + and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala. + </p> + + <div class="p"> +The following example demonstrates calls to several aggregation functions +using values from a column containing nested complex types +(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items). +The array is unpacked inside the query using join notation. +The array elements are referenced using the <code class="ph codeph">ITEM</code> +pseudocolumn, and the structure fields inside the array elements +are referenced using dot notation. +Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code> +are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and +the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> +values are computed from the string <code class="ph codeph">N_NAME</code> field. +<pre class="pre codeblock"><code>describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +select r_name, r_nations.item.n_nationkey + from region, region.r_nations as r_nations +order by r_name, r_nations.item.n_nationkey; ++-------------+------------------+ +| r_name | item.n_nationkey | ++-------------+------------------+ +| AFRICA | 0 | +| AFRICA | 5 | +| AFRICA | 14 | +| AFRICA | 15 | +| AFRICA | 16 | +| AMERICA | 1 | +| AMERICA | 2 | +| AMERICA | 3 | +| AMERICA | 17 | +| AMERICA | 24 | +| ASIA | 8 | +| ASIA | 9 | +| ASIA | 12 | +| ASIA | 18 | +| ASIA | 21 | +| EUROPE | 6 | +| EUROPE | 7 | +| EUROPE | 19 | +| EUROPE | 22 | +| EUROPE | 23 | +| MIDDLE EAST | 4 | +| MIDDLE EAST | 10 | +| MIDDLE EAST | 11 | +| MIDDLE EAST | 13 | +| MIDDLE EAST | 20 | ++-------------+------------------+ + +select + r_name, + count(r_nations.item.n_nationkey) as count, + sum(r_nations.item.n_nationkey) as sum, + avg(r_nations.item.n_nationkey) as avg, + min(r_nations.item.n_name) as minimum, + max(r_nations.item.n_name) as maximum, + ndv(r_nations.item.n_nationkey) as distinct_vals +from + region, region.r_nations as r_nations +group by r_name +order by r_name; ++-------------+-------+-----+------+-----------+----------------+---------------+ +| r_name | count | sum | avg | minimum | maximum | distinct_vals | ++-------------+-------+-----+------+-----------+----------------+---------------+ +| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 | +| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 | +| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 | +| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 | +| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 | ++-------------+-------+-----+------+-----------+----------------+---------------+ +</code></pre> +</div> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table. +select min(c1) from t1; +-- Find the smallest value for this column from a subset of the table. +select min(c1) from t1 where month = 'January' and year = '2013'; +-- Find the smallest value from a set of numeric function results. +select min(length(s)) from t1; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Return more than one result. +select month, year, min(purchase_price) from store_stats group by month, year; +-- Filter the input to eliminate duplicates before performing the calculation. +select min(distinct x) from t1; +</code></pre> + + <div class="p"> + The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as + opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set. +<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even'); ++----+----------+-----+ +| x | property | min | ++----+----------+-----+ +| 2 | even | 2 | +| 4 | even | 2 | +| 6 | even | 2 | +| 8 | even | 2 | +| 10 | even | 2 | +| 1 | odd | 1 | +| 3 | odd | 1 | +| 5 | odd | 1 | +| 7 | odd | 1 | +| 9 | odd | 1 | ++----+----------+-----+ +</code></pre> + +Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context +(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code> +encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code> +clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code> +result only decreases or stays the same throughout each partition within the result set. +The basic <code class="ph codeph">ORDER BY x</code> clause implicitly +activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +therefore all of these examples produce the same results: + +<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point' + from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | minimum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 5 | +| 3 | prime | 3 | +| 2 | prime | 2 | +| 9 | square | 2 | +| 4 | square | 2 | +| 1 | square | 1 | ++---+----------+-----------------------+ + +select x, property, + min(x) over + ( + <strong class="ph b">order by property, x desc</strong> + <strong class="ph b">range between unbounded preceding and current row</strong> + ) as 'minimum to this point' +from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | minimum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 5 | +| 3 | prime | 3 | +| 2 | prime | 2 | +| 9 | square | 2 | +| 4 | square | 2 | +| 1 | square | 1 | ++---+----------+-----------------------+ + +select x, property, + min(x) over + ( + <strong class="ph b">order by property, x desc</strong> + <strong class="ph b">rows between unbounded preceding and current row</strong> + ) as 'minimum to this point' +from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | minimum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 5 | +| 3 | prime | 3 | +| 2 | prime | 2 | +| 9 | square | 2 | +| 4 | square | 2 | +| 1 | square | 1 | ++---+----------+-----------------------+ +</code></pre> + +The following examples show how to construct a moving window, with a running minimum taking into account all rows before +and 1 row after the current row. +Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of +moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause. +Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an +analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>. +<pre class="pre codeblock"><code>select x, property, + min(x) over + ( + <strong class="ph b">order by property, x desc</strong> + <strong class="ph b">rows between unbounded preceding and 1 following</strong> + ) as 'local minimum' +from int_t where property in ('prime','square'); ++---+----------+---------------+ +| x | property | local minimum | ++---+----------+---------------+ +| 7 | prime | 5 | +| 5 | prime | 3 | +| 3 | prime | 2 | +| 2 | prime | 2 | +| 9 | square | 2 | +| 4 | square | 1 | +| 1 | square | 1 | ++---+----------+---------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + min(x) over + ( + <strong class="ph b">order by property, x desc</strong> + <strong class="ph b">range between unbounded preceding and 1 following</strong> + ) as 'local minimum' +from int_t where property in ('prime','square'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</code></pre> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>, + <a class="xref" href="impala_avg.html#avg">AVG Function</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_misc_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_misc_functions.html b/docs/build/html/topics/impala_misc_functions.html new file mode 100644 index 0000000..bf8c990 --- /dev/null +++ b/docs/build/html/topics/impala_misc_functions.html @@ -0,0 +1,175 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1> + + + + <div class="body conbody"> + + <p class="p"> + Impala supports the following utility functions that do not operate on a particular column or data type: + </p> + + <dl class="dl"> + + + <dt class="dt dlterm" id="misc_functions__current_database"> + <code class="ph codeph">current_database()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code> + if no database has been selected, or whatever database the session switched to through a + <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="misc_functions__effective_user"> + <code class="ph codeph">effective_user()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>, + except if delegation is enabled, in which case it returns the ID of the delegated user. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="misc_functions__pid"> + <code class="ph codeph">pid()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is + connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the + arguments, and so on the <span class="keyword cmdname">impalad</span> process. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> + </p> + </dd> + + + + + + + + <dt class="dt dlterm" id="misc_functions__user"> + <code class="ph codeph">user()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span> + daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to + understand how authorization settings apply in a security context; once you know the logged-in username, + you can check which groups that user belongs to, and from the list of groups you can check which roles + are available to those groups through the authorization policy file. + <p class="p"> + In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as + <code class="ph codeph">[email protected]</code>, in a Kerberized environment. + </p> + <p class="p"> + When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + </dd> + + + + + + <dt class="dt dlterm" id="misc_functions__uuid"> + <code class="ph codeph">uuid()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span> + </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used + as identifiers within a table, and as join keys across multiple tables. + The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not + require storing or querying the highest sequence number. For example, you + can use it to quickly construct new unique identifiers during a data import job, + or to combine data from different tables without the likelihood of ID collisions. + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code> +-- Each call to uuid() produces a new arbitrary value. +select uuid(); ++--------------------------------------+ +| uuid() | ++--------------------------------------+ +| c7013e25-1455-457f-bf74-a2046e58caea | ++--------------------------------------+ + +-- If you get a UUID for each row of a result set, you can use it as a +-- unique identifier within a table, or even a unique ID across tables. +select uuid() from four_row_table; ++--------------------------------------+ +| uuid() | ++--------------------------------------+ +| 51d3c540-85e5-4cb9-9110-604e53999e2e | +| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 | +| 5e9d7c36-9842-4a96-862d-c13cd0457c02 | +| cae29095-0cc0-4053-a5ea-7fcd3c780861 | ++--------------------------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="misc_functions__version"> + <code class="ph codeph">version()</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the + <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you + are connected to the expected level of Impala to use a particular feature, or to connect to several nodes + and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>. + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines) + </p> + </dd> + + + </dl> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mixed_security.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_mixed_security.html b/docs/build/html/topics/impala_mixed_security.html new file mode 100644 index 0000000..b48fa8e --- /dev/null +++ b/docs/build/html/topics/impala_mixed_security.html @@ -0,0 +1,26 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1> + + + <div class="body conbody"> + + <p class="p"> + Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each + <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No + special actions need to be taken if some users authenticate through Kerberos and some through LDAP. + </p> + + <p class="p"> + Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port + depending on the kind of authentication, then configure your network load balancer to forward each kind of + request to a DataNode that was set up with the appropriate authentication type. Once the initial request was + made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating + the work across multiple nodes and transmitting intermediate results back to the coordinator node. + </p> + + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mt_dop.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_mt_dop.html b/docs/build/html/topics/impala_mt_dop.html new file mode 100644 index 0000000..adb4f0f --- /dev/null +++ b/docs/build/html/topics/impala_mt_dop.html @@ -0,0 +1,190 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Sets the degree of intra-node parallelism used for certain operations that + can benefit from multithreaded execution. You can specify values + higher than zero to find the ideal balance of response time, + memory usage, and CPU usage during statement processing. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The Impala execution engine is being revamped incrementally to add + additional parallelism within a single host for certain statements and + kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the + <span class="q">"old"</span> code path with limited intra-node parallelism. + </p> + + <p class="p"> + Currently, the operations affected by the <code class="ph codeph">MT_DOP</code> + query option are: + </p> + <ul class="ul"> + <li class="li"> + <p class="p"> + <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets + <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables. + </p> + </li> + <li class="li"> + <p class="p"> + Queries with execution plans containing only scan and aggregation operators, + or local joins that do not need data exchanges (such as for nested types). + Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero + value. Therefore, this query option is typically only set for the duration of + specific long-running, CPU-intensive queries. + </p> + </li> + </ul> + + </div> + + <p class="p"> + <strong class="ph b">Type:</strong> integer + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">0</code> + </p> + <p class="p"> + Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> + statements for Parquet tables benefit substantially from extra intra-node + parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats + for Parquet tables. + </p> + <p class="p"> + <strong class="ph b">Range:</strong> 0 to 64 + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + Any timing figures in the following examples are on a small, lightly loaded development cluster. + Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and + partitions within each table. + </p> + </div> + + <p class="p"> + The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code> + statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code> + setting: + </p> + +<pre class="pre codeblock"><code> +-- Explicitly setting MT_DOP to 0 selects the old code path. +set mt_dop = 0; +MT_DOP set to 0 + +-- The analysis for the billion rows is distributed among hosts, +-- but uses only a single core on each host. +compute stats billion_rows_parquet; ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 2 column(s). | ++-----------------------------------------+ + +drop stats billion_rows_parquet; + +-- Using 4 logical processors per host is faster. +set mt_dop = 4; +MT_DOP set to 4 + +compute stats billion_rows_parquet; ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 2 column(s). | ++-----------------------------------------+ + +drop stats billion_rows_parquet; + +-- Unsetting the option reverts back to its default. +-- Which for COMPUTE STATS and a Parquet table is 4, +-- so again it uses the fast path. +unset MT_DOP; +Unsetting option MT_DOP + +compute stats billion_rows_parquet; ++-----------------------------------------+ +| summary | ++-----------------------------------------+ +| Updated 1 partition(s) and 2 column(s). | ++-----------------------------------------+ + +</code></pre> + + <p class="p"> + The following example shows the effects of setting <code class="ph codeph">MT_DOP</code> + for a query involving only scan and aggregation operations for a Parquet table: + </p> + +<pre class="pre codeblock"><code> +set mt_dop = 0; +MT_DOP set to 0 + +-- COUNT(DISTINCT) for a unique column is CPU-intensive. +select count(distinct id) from billion_rows_parquet; ++--------------------+ +| count(distinct id) | ++--------------------+ +| 1000000000 | ++--------------------+ +Fetched 1 row(s) in 67.20s + +set mt_dop = 16; +MT_DOP set to 16 + +-- Introducing more intra-node parallelism for the aggregation +-- speeds things up, and potentially reduces memory overhead by +-- reducing the number of scanner threads. +select count(distinct id) from billion_rows_parquet; ++--------------------+ +| count(distinct id) | ++--------------------+ +| 1000000000 | ++--------------------+ +Fetched 1 row(s) in 17.19s + +</code></pre> + + <p class="p"> + The following example shows how queries that are not compatible with non-zero + <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code> + is set: + </p> + +<pre class="pre codeblock"><code> +set mt_dop=1; +MT_DOP set to 1 + +select * from a1 inner join a2 + on a1.id = a2.id limit 4; +ERROR: NotImplementedException: MT_DOP not supported for plans with + base table joins or table sinks. + +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>, + <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ndv.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_ndv.html b/docs/build/html/topics/impala_ndv.html new file mode 100644 index 0000000..d6c3117 --- /dev/null +++ b/docs/build/html/topics/impala_ndv.html @@ -0,0 +1,226 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1> + + + + <div class="body conbody"> + + <p class="p"> + + An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT + <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the + combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and + thus is less memory-intensive for columns with high cardinality. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the + number of distinct values in a column. + </p> + + <p class="p"> + Because this number is an estimate, it might not reflect the precise number of different values in the + column, especially if the cardinality is very low or very high. If the estimated number is higher than the + number of rows in the table, Impala adjusts the value internally during query planning. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier + releases + </p> + + + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>) + in an aggregation function, you unpack the individual elements using join notation in the query, + and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala. + </p> + + <div class="p"> +The following example demonstrates calls to several aggregation functions +using values from a column containing nested complex types +(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items). +The array is unpacked inside the query using join notation. +The array elements are referenced using the <code class="ph codeph">ITEM</code> +pseudocolumn, and the structure fields inside the array elements +are referenced using dot notation. +Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code> +are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and +the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> +values are computed from the string <code class="ph codeph">N_NAME</code> field. +<pre class="pre codeblock"><code>describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +select r_name, r_nations.item.n_nationkey + from region, region.r_nations as r_nations +order by r_name, r_nations.item.n_nationkey; ++-------------+------------------+ +| r_name | item.n_nationkey | ++-------------+------------------+ +| AFRICA | 0 | +| AFRICA | 5 | +| AFRICA | 14 | +| AFRICA | 15 | +| AFRICA | 16 | +| AMERICA | 1 | +| AMERICA | 2 | +| AMERICA | 3 | +| AMERICA | 17 | +| AMERICA | 24 | +| ASIA | 8 | +| ASIA | 9 | +| ASIA | 12 | +| ASIA | 18 | +| ASIA | 21 | +| EUROPE | 6 | +| EUROPE | 7 | +| EUROPE | 19 | +| EUROPE | 22 | +| EUROPE | 23 | +| MIDDLE EAST | 4 | +| MIDDLE EAST | 10 | +| MIDDLE EAST | 11 | +| MIDDLE EAST | 13 | +| MIDDLE EAST | 20 | ++-------------+------------------+ + +select + r_name, + count(r_nations.item.n_nationkey) as count, + sum(r_nations.item.n_nationkey) as sum, + avg(r_nations.item.n_nationkey) as avg, + min(r_nations.item.n_name) as minimum, + max(r_nations.item.n_name) as maximum, + ndv(r_nations.item.n_nationkey) as distinct_vals +from + region, region.r_nations as r_nations +group by r_name +order by r_name; ++-------------+-------+-----+------+-----------+----------------+---------------+ +| r_name | count | sum | avg | minimum | maximum | distinct_vals | ++-------------+-------+-----+------+-----------+----------------+---------------+ +| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 | +| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 | +| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 | +| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 | +| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 | ++-------------+-------+-----+------+-----------+----------------+---------------+ +</code></pre> +</div> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <p class="p"> + This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example queries a billion-row table to illustrate the relative performance of + <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code> + gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient. + The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster. + </p> + +<pre class="pre codeblock"><code>select count(distinct col1) from sample_data; ++---------------------+ +| count(distinct col1)| ++---------------------+ +| 100000 | ++---------------------+ +Fetched 1 row(s) in 20.13s + +select cast(ndv(col1) as bigint) as col1 from sample_data; ++----------+ +| col1 | ++----------+ +| 139017 | ++----------+ +Fetched 1 row(s) in 8.91s +</code></pre> + + <p class="p"> + The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to + easily learn which columns have substantially more or fewer distinct values. This technique is faster than + running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls. + </p> + +<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2, + cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4 + from sample_data; ++----------+-----------+------------+-----------+ +| col1 | col2 | col3 | col4 | ++----------+-----------+------------+-----------+ +| 139017 | 282 | 46 | 145636240 | ++----------+-----------+------------+-----------+ +Fetched 1 row(s) in 34.97s + +select count(distinct col1) from sample_data; ++---------------------+ +| count(distinct col1)| ++---------------------+ +| 100000 | ++---------------------+ +Fetched 1 row(s) in 20.13s + +select count(distinct col2) from sample_data; ++----------------------+ +| count(distinct col2) | ++----------------------+ +| 278 | ++----------------------+ +Fetched 1 row(s) in 20.09s + +select count(distinct col3) from sample_data; ++-----------------------+ +| count(distinct col3) | ++-----------------------+ +| 46 | ++-----------------------+ +Fetched 1 row(s) in 19.12s + +select count(distinct col4) from sample_data; ++----------------------+ +| count(distinct col4) | ++----------------------+ +| 147135880 | ++----------------------+ +Fetched 1 row(s) in 266.95s +</code></pre> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file
