http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_optimize_partition_key_scans.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_optimize_partition_key_scans.html b/docs/build/html/topics/impala_optimize_partition_key_scans.html new file mode 100644 index 0000000..07bfbb1 --- /dev/null +++ b/docs/build/html/topics/impala_optimize_partition_key_scans.html @@ -0,0 +1,188 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="optimize_partition_key_scans"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</title></head><body id="optimize_partition_key_scans"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">OPTIMIZE_PARTITION_KEY_SCANS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Enables a fast code path for queries that apply simple aggregate functions to partition key + columns: <code class="ph codeph">MIN(<var class="keyword varname">key_column</var>)</code>, <code class="ph codeph">MAX(<var class="keyword varname">key_column</var>)</code>, + or <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">key_column</var>)</code>. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value + <code class="ph codeph">true</code> is not recognized. This limitation is + tracked by the issue + <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>, + which shows the releases where the problem is fixed. + </div> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span> + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + This optimization speeds up common <span class="q">"introspection"</span> operations when using queries + to calculate the cardinality and range for partition key columns. + </p> + + <p class="p"> + This optimization does not apply if the queries contain any <code class="ph codeph">WHERE</code>, + <code class="ph codeph">GROUP BY</code>, or <code class="ph codeph">HAVING</code> clause. The relevant queries + should only compute the minimum, maximum, or number of distinct values for the + partition key columns across the whole table. + </p> + + <p class="p"> + This optimization is enabled by a query option because it skips some consistency checks + and therefore can return slightly different partition values if partitions are in the + process of being added, dropped, or loaded outside of Impala. Queries might exhibit different + behavior depending on the setting of this option in the following cases: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + If files are removed from a partition using HDFS or other non-Impala operations, + there is a period until the next <code class="ph codeph">REFRESH</code> of the table where regular + queries fail at run time because they detect the missing files. With this optimization + enabled, queries that evaluate only the partition key column values (not the contents of + the partition itself) succeed, and treat the partition as if it still exists. + </p> + </li> + <li class="li"> + <p class="p"> + If a partition contains any data files, but the data files do not contain any rows, + a regular query considers that the partition does not exist. With this optimization + enabled, the partition is treated as if it exists. + </p> + <p class="p"> + If the partition includes no files at all, this optimization does not change the query + behavior: the partition is considered to not exist whether or not this optimization is enabled. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example shows initial schema setup and the default behavior of queries that + return just the partition key column for a table: + </p> + +<pre class="pre codeblock"><code> +-- Make a partitioned table with 3 partitions. +create table t1 (s string) partitioned by (year int); +insert into t1 partition (year=2015) values ('last year'); +insert into t1 partition (year=2016) values ('this year'); +insert into t1 partition (year=2017) values ('next year'); + +-- Regardless of the option setting, this query must read the +-- data files to know how many rows to return for each year value. +explain select year from t1; ++-----------------------------------------------------+ +| Explain String | ++-----------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=0B VCores=0 | +| | +| F00:PLAN FRAGMENT [UNPARTITIONED] | +| 00:SCAN HDFS [key_cols.t1] | +| partitions=3/3 files=4 size=40B | +| table stats: 3 rows total | +| column stats: all | +| hosts=3 per-host-mem=unavailable | +| tuple-ids=0 row-size=4B cardinality=3 | ++-----------------------------------------------------+ + +-- The aggregation operation means the query does not need to read +-- the data within each partition: the result set contains exactly 1 row +-- per partition, derived from the partition key column value. +-- By default, Impala still includes a 'scan' operation in the query. +explain select distinct year from t1; ++------------------------------------------------------------------------------------+ +| Explain String | ++------------------------------------------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=0B VCores=0 | +| | +| 01:AGGREGATE [FINALIZE] | +| | group by: year | +| | | +| 00:SCAN HDFS [key_cols.t1] | +| partitions=0/0 files=0 size=0B | ++------------------------------------------------------------------------------------+ +</code></pre> + + <p class="p"> + The following examples show how the plan is made more efficient when the + <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> option is enabled: + </p> + +<pre class="pre codeblock"><code> +set optimize_partition_key_scans=1; +OPTIMIZE_PARTITION_KEY_SCANS set to 1 + +-- The aggregation operation is turned into a UNION internally, +-- with constant values known in advance based on the metadata +-- for the partitioned table. +explain select distinct year from t1; ++-----------------------------------------------------+ +| Explain String | ++-----------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=0B VCores=0 | +| | +| F00:PLAN FRAGMENT [UNPARTITIONED] | +| 01:AGGREGATE [FINALIZE] | +| | group by: year | +| | hosts=1 per-host-mem=unavailable | +| | tuple-ids=1 row-size=4B cardinality=3 | +| | | +| 00:UNION | +| constant-operands=3 | +| hosts=1 per-host-mem=unavailable | +| tuple-ids=0 row-size=4B cardinality=3 | ++-----------------------------------------------------+ + +-- The same optimization applies to other aggregation queries +-- that only return values based on partition key columns: +-- MIN, MAX, COUNT(DISTINCT), and so on. +explain select min(year) from t1; ++-----------------------------------------------------+ +| Explain String | ++-----------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=0B VCores=0 | +| | +| F00:PLAN FRAGMENT [UNPARTITIONED] | +| 01:AGGREGATE [FINALIZE] | +| | output: min(year) | +| | hosts=1 per-host-mem=unavailable | +| | tuple-ids=1 row-size=4B cardinality=1 | +| | | +| 00:UNION | +| constant-operands=3 | +| hosts=1 per-host-mem=unavailable | +| tuple-ids=0 row-size=4B cardinality=3 | ++-----------------------------------------------------+ +</code></pre> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_order_by.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_order_by.html b/docs/build/html/topics/impala_order_by.html new file mode 100644 index 0000000..c3f5105 --- /dev/null +++ b/docs/build/html/topics/impala_order_by.html @@ -0,0 +1,407 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="order_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ORDER BY Clause</title></head><body id="order_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1> + + + <div class="body conbody"> + + <p class="p"> + The familiar <code class="ph codeph">ORDER BY</code> clause of a <code class="ph codeph">SELECT</code> statement sorts the result set + based on the values from one or more columns. + </p> + + <p class="p"> + For distributed queries, this is a relatively expensive operation, because the entire result set must be + produced and transferred to one node before the sorting can happen. This can require more memory capacity + than a query without <code class="ph codeph">ORDER BY</code>. Even if the query takes approximately the same time to finish + with or without the <code class="ph codeph">ORDER BY</code> clause, subjectively it can appear slower because no results + are available until all processing is finished, rather than results coming back gradually as rows matching + the <code class="ph codeph">WHERE</code> clause are found. Therefore, if you only need the first N results from the sorted + result set, also include the <code class="ph codeph">LIMIT</code> clause, which reduces network overhead and the memory + requirement on the coordinator node. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> clause is now optional (rather than required) for + queries that use the <code class="ph codeph">ORDER BY</code> clause. Impala automatically uses a temporary disk work area + to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular + DataNode. + </p> + </div> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + The full syntax for the <code class="ph codeph">ORDER BY</code> clause is: + </p> + +<pre class="pre codeblock"><code>ORDER BY <var class="keyword varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC | DESC] [NULLS FIRST | NULLS LAST] + +col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword varname">integer_literal</var> +</code></pre> + + <p class="p"> + Although the most common usage is <code class="ph codeph">ORDER BY <var class="keyword varname">column_name</var></code>, you can also + specify <code class="ph codeph">ORDER BY 1</code> to sort by the first column of the result set, <code class="ph codeph">ORDER BY + 2</code> to sort by the second column, and so on. The number must be a numeric literal, not some other kind + of constant expression. (If the argument is some other expression, even a <code class="ph codeph">STRING</code> value, the + query succeeds but the order of results is undefined.) + </p> + + <p class="p"> + <code class="ph codeph">ORDER BY <var class="keyword varname">column_number</var></code> can only be used when the query explicitly lists + the columns in the <code class="ph codeph">SELECT</code> list, not with <code class="ph codeph">SELECT *</code> queries. + </p> + + <p class="p"> + <strong class="ph b">Ascending and descending sorts:</strong> + </p> + + <p class="p"> + The default sort order (the same as using the <code class="ph codeph">ASC</code> keyword) puts the smallest values at the + start of the result set, and the largest values at the end. Specifying the <code class="ph codeph">DESC</code> keyword + reverses that order. + </p> + + <p class="p"> + <strong class="ph b">Sort order for NULL values:</strong> + </p> + + <p class="p"> + See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are positioned + in the sorted result set, and how to use the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code> + clauses. (The sort position for <code class="ph codeph">NULL</code> values in <code class="ph codeph">ORDER BY ... DESC</code> queries is + changed in Impala 1.2.1 and higher to be more standards-compliant, and the <code class="ph codeph">NULLS FIRST</code> and + <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 1.2.1.) + </p> + + <p class="p"> + Prior to Impala 1.4.0, Impala required any query including an + <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a + <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and + higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where + sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node, + Impala automatically uses a temporary disk work area to perform the sort operation. + </p> + + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>, + <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot + be referenced directly in the <code class="ph codeph">ORDER BY</code> clause. + When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements + of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code> + clause to control the order in the result set of the scalar elements from the complex type. + See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types. + </p> + + <p class="p"> + The following query shows how a complex type column cannot be directly used in an <code class="ph codeph">ORDER BY</code> clause: + </p> + +<pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY <BIGINT>) STORED AS PARQUET; +...use LOAD DATA to load externally created Parquet files into the table... +SELECT id FROM games ORDER BY score DESC; +ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY<BIGINT>' is not supported. +</code></pre> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following query retrieves the user ID and score, only for scores greater than one million, + with the highest scores for each user listed first. + Because the individual array elements are now represented as separate rows in the result set, + they can be used in the <code class="ph codeph">ORDER BY</code> clause, referenced using the <code class="ph codeph">ITEM</code> + pseudocolumn that represents each array element. + </p> + +<pre class="pre codeblock"><code>SELECT id, item FROM games, games.score + WHERE item > 1000000 +ORDER BY id, item desc; +</code></pre> + + <p class="p"> + The following queries use similar <code class="ph codeph">ORDER BY</code> techniques with variations of the <code class="ph codeph">GAMES</code> + table, where the complex type is an <code class="ph codeph">ARRAY</code> containing <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code> + elements to represent additional details about each game that was played. + For an array of structures, the fields of the structure are referenced as <code class="ph codeph">ITEM.<var class="keyword varname">field_name</var></code>. + For an array of maps, the keys and values within each array element are referenced as <code class="ph codeph">ITEM.KEY</code> + and <code class="ph codeph">ITEM.VALUE</code>. + </p> + +<pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array < struct <game_name: string, score: BIGINT, high_score: boolean> >) STORED AS PARQUET +...use LOAD DATA to load externally created Parquet files into the table... +SELECT id, item.game_name, item.score FROM games2, games2.play + WHERE item.score > 1000000 +ORDER BY id, item.score DESC; + +CREATE TABLE games3 (id BIGINT, play ARRAY < MAP <STRING, BIGINT> >) STORED AS PARQUET; +...use LOAD DATA to load externally created Parquet files into the table... +SELECT id, info.key AS k, info.value AS v from games3, games3.play AS plays, games3.play.item AS info + WHERE info.KEY = 'score' AND info.VALUE > 1000000 +ORDER BY id, info.value desc; +</code></pre> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Although the <code class="ph codeph">LIMIT</code> clause is now optional on <code class="ph codeph">ORDER BY</code> queries, if your + query only needs some number of rows that you can predict in advance, use the <code class="ph codeph">LIMIT</code> clause + to reduce unnecessary processing. For example, if the query has a clause <code class="ph codeph">LIMIT 10</code>, each data + node sorts its portion of the relevant result set and only returns 10 rows to the coordinator node. The + coordinator node picks the 10 highest or lowest row values out of this small intermediate result set. + </p> + + <p class="p"> + If an <code class="ph codeph">ORDER BY</code> clause is applied to an early phase of query processing, such as a subquery + or a view definition, Impala ignores the <code class="ph codeph">ORDER BY</code> clause. To get ordered results from a + subquery or view, apply an <code class="ph codeph">ORDER BY</code> clause to the outermost or final <code class="ph codeph">SELECT</code> + level. + </p> + + <p class="p"> + <code class="ph codeph">ORDER BY</code> is often used in combination with <code class="ph codeph">LIMIT</code> to perform <span class="q">"top-N"</span> + queries: + </p> + +<pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", SUM(page_views) FROM web_stats + GROUP BY page_views, user_id + ORDER BY SUM(page_views) DESC LIMIT 10; +</code></pre> + + <p class="p"> + <code class="ph codeph">ORDER BY</code> is sometimes used in combination with <code class="ph codeph">OFFSET</code> and + <code class="ph codeph">LIMIT</code> to paginate query results, although it is relatively inefficient to issue multiple + queries like this against the large tables typically used with Impala: + </p> + +<pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search results", page_url FROM search_content + WHERE LOWER(page_title) LIKE '%game%') + ORDER BY page_title LIMIT 10 OFFSET 0; +SELECT page_title AS "Page 2 of search results", page_url FROM search_content + WHERE LOWER(page_title) LIKE '%game%') + ORDER BY page_title LIMIT 10 OFFSET 10; +SELECT page_title AS "Page 3 of search results", page_url FROM search_content + WHERE LOWER(page_title) LIKE '%game%') + ORDER BY page_title LIMIT 10 OFFSET 20; +</code></pre> + + <p class="p"> + <strong class="ph b">Internal details:</strong> + </p> + + <p class="p"> + Impala sorts the intermediate results of an <code class="ph codeph">ORDER BY</code> clause in memory whenever practical. In + a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying + depending on how the data matching the query is distributed in HDFS. + </p> + + <p class="p"> + If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala + memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This + technique is known in industry terminology as <span class="q">"external sorting"</span> and <span class="q">"spilling to disk"</span>.) As each + 8 MB batch of data is written to disk, Impala frees the corresponding memory to sort a new 8 MB batch of + data. When all the data has been processed, a final merge sort operation is performed to correctly order the + in-memory and on-disk results as the result set is transmitted back to the coordinator node. When external + sorting becomes necessary, Impala requires approximately 60 MB of RAM at a minimum for the buffers needed to + read, write, and sort the intermediate results. If more RAM is available on the DataNode, Impala will use + the additional RAM to minimize the amount of disk I/O for sorting. + </p> + + <p class="p"> + This external sort technique is used as appropriate on each DataNode (possibly including the coordinator + node) to sort the portion of the result set that is processed on that node. When the sorted intermediate + results are sent back to the coordinator node to produce the final result set, the coordinator node uses a + merge sort technique to produce a final sorted result set without using any extra resources on the + coordinator node. + </p> + + <p class="p"> + <strong class="ph b">Configuration for disk usage:</strong> + </p> + + <p class="p"> + By default, intermediate files used during large sort, join, aggregation, or analytic function operations + are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the + operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span> + technique, without any name conflicts for these temporary files.) You can specify a different location by + starting the <span class="keyword cmdname">impalad</span> daemon with the + <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option. + You can specify a single directory, or a comma-separated list of directories. The scratch directories must + be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts, + depending on the capacity and speed + of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning + Impala successfully starts (with a warning written to the log) if it cannot create or read and write files + in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides, + Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing + files in a scratch directory during a query, Impala logs the error and the query fails. + </p> + + + + + + <p class="p"> + <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an + <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the + results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates + many different data files, prepared on different data nodes, and therefore the notion of the data being + stored in sorted order is impractical. + </p> + + <div class="p"> + An <code class="ph codeph">ORDER BY</code> clause without an additional <code class="ph codeph">LIMIT</code> clause is ignored in any + view definition. If you need to sort the entire result set from a view, use an <code class="ph codeph">ORDER BY</code> + clause in the <code class="ph codeph">SELECT</code> statement that queries the view. You can still make a simple <span class="q">"top + 10"</span> report by combining the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses in the same + view definition: +<pre class="pre codeblock"><code>[localhost:21000] > create table unsorted (x bigint); +[localhost:21000] > insert into unsorted values (1), (9), (3), (7), (5), (8), (4), (6), (2); +[localhost:21000] > create view sorted_view as select x from unsorted order by x; +[localhost:21000] > select x from sorted_view; -- ORDER BY clause in view has no effect. ++---+ +| x | ++---+ +| 1 | +| 9 | +| 3 | +| 7 | +| 5 | +| 8 | +| 4 | +| 6 | +| 2 | ++---+ +[localhost:21000] > select x from sorted_view order by x; -- View query requires ORDER BY at outermost level. ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | +| 4 | +| 5 | +| 6 | +| 7 | +| 8 | +| 9 | ++---+ +[localhost:21000] > create view top_3_view as select x from unsorted order by x limit 3; +[localhost:21000] > select x from top_3_view; -- ORDER BY and LIMIT together in view definition are preserved. ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | ++---+ +</code></pre> + </div> + + <p class="p"> + With the lifting of the requirement to include a <code class="ph codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER + BY</code> query (in Impala 1.4 and higher): + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Now the use of scratch disk space raises the possibility of an <span class="q">"out of disk space"</span> error on a + particular DataNode, as opposed to the previous possibility of an <span class="q">"out of memory"</span> error. Make sure + to keep at least 1 GB free on the filesystem used for temporary sorting work. + </p> + </li> + + <li class="li"> + <p class="p"> + The query options + <a class="xref" href="impala_default_order_by_limit.html#default_order_by_limit">DEFAULT_ORDER_BY_LIMIT</a> and + <a class="xref" href="impala_abort_on_default_limit_exceeded.html#abort_on_default_limit_exceeded">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</a>, + which formerly controlled the behavior of <code class="ph codeph">ORDER BY</code> queries with no limit specified, are + now ignored. + </p> + </li> + </ul> + + <p class="p"> + In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for + <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ... + DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for + sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for + <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible + with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting + behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS + LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause. + </p> +<pre class="pre codeblock"><code>[localhost:21000] > create table numbers (x int); +[localhost:21000] > insert into numbers values (1), (null), (2), (null), (3); +[localhost:21000] > select x from numbers order by x nulls first; ++------+ +| x | ++------+ +| NULL | +| NULL | +| 1 | +| 2 | +| 3 | ++------+ +[localhost:21000] > select x from numbers order by x desc nulls first; ++------+ +| x | ++------+ +| NULL | +| NULL | +| 3 | +| 2 | +| 1 | ++------+ +[localhost:21000] > select x from numbers order by x nulls last; ++------+ +| x | ++------+ +| 1 | +| 2 | +| 3 | +| NULL | +| NULL | ++------+ +[localhost:21000] > select x from numbers order by x desc nulls last; ++------+ +| x | ++------+ +| 3 | +| 2 | +| 1 | +| NULL | +| NULL | ++------+ +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for further examples of queries with the <code class="ph codeph">ORDER + BY</code> clause. + </p> + + <p class="p"> + Analytic functions use the <code class="ph codeph">ORDER BY</code> clause in a different context to define the sequence in + which rows are analyzed. See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details. + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
