http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_struct.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_struct.html b/docs/build/html/topics/impala_struct.html new file mode 100644 index 0000000..c796fe9 --- /dev/null +++ b/docs/build/html/topics/impala_struct.html @@ -0,0 +1,500 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="struct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRUCT Complex Type (Impala 2.3 or higher only)</title></head><body id="struct"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">STRUCT Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + A complex data type, representing multiple fields of a single item. Frequently used as the element type of an <code class="ph codeph">ARRAY</code> + or the <code class="ph codeph">VALUE</code> part of a <code class="ph codeph">MAP</code>. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRUCT < <var class="keyword varname">name</var> : <var class="keyword varname">type</var> [COMMENT '<var class="keyword varname">comment_string</var>'], ... > + +type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var> +</code></pre> + + <p class="p"> + The names and number of fields within the <code class="ph codeph">STRUCT</code> are fixed. Each field can be a different type. A field within a + <code class="ph codeph">STRUCT</code> can also be another <code class="ph codeph">STRUCT</code>, or an <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, allowing + you to create nested data structures with a maximum nesting depth of 100. + </p> + + <p class="p"> + A <code class="ph codeph">STRUCT</code> can be the top-level type for a column, or can itself be an item within an <code class="ph codeph">ARRAY</code> or the + value part of the key-value pair in a <code class="ph codeph">MAP</code>. + </p> + + <p class="p"> + When a <code class="ph codeph">STRUCT</code> is used as an <code class="ph codeph">ARRAY</code> element or a <code class="ph codeph">MAP</code> value, you use a join clause to + bring the <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> elements into the result set, and then refer to + <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM.<var class="keyword varname">field</var></code> or + <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field</var></code>. In the case of a <code class="ph codeph">STRUCT</code> directly inside + an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, you can omit the <code class="ph codeph">.ITEM</code> and <code class="ph codeph">.VALUE</code> pseudocolumns + and refer directly to <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field</var></code> or + <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field</var></code>. + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Because complex types are often used in combination, + for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> + elements, if you are unfamiliar with the Impala complex types, + start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for + background information and usage examples. + </p> + + <p class="p"> + A <code class="ph codeph">STRUCT</code> is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined + type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an + <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements. + </p> + + <p class="p"> + Because a <code class="ph codeph">STRUCT</code> has a fixed number of named fields, it typically does not make sense to have a + <code class="ph codeph">STRUCT</code> as the type of a table column. In such a case, you could just make each field of the <code class="ph codeph">STRUCT</code> + into a separate column of the table. The <code class="ph codeph">STRUCT</code> type is most useful as an item of an <code class="ph codeph">ARRAY</code> or the + value part of the key-value pair in a <code class="ph codeph">MAP</code>. A nested type column with a <code class="ph codeph">STRUCT</code> at the lowest level + lets you associate a variable number of row-like objects with each row of the table. + </p> + + <p class="p"> + The <code class="ph codeph">STRUCT</code> type is straightforward to reference within a query. You do not need to include the + <code class="ph codeph">STRUCT</code> column in a join clause or give it a table alias, as is required for the <code class="ph codeph">ARRAY</code> and + <code class="ph codeph">MAP</code> types. You refer to the individual fields using dot notation, such as + <code class="ph codeph"><var class="keyword varname">struct_column_name</var>.<var class="keyword varname">field_name</var></code>, without any pseudocolumn such as + <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code>. + </p> + + <p class="p"> + You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code> + to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code> + column and visualize its structure as if it were a table. + For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column + <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>. + If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>, + and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>, + you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>. + An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with + <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. + A <code class="ph codeph">STRUCT</code> is shown as a table with each field + representing a column in the table. + A <code class="ph codeph">MAP</code> is shown as a two-column table, with + <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns. + </p> + + <p class="p"> + <strong class="ph b">Internal details:</strong> + </p> + + <p class="p"> + Within the Parquet data file, the values for each <code class="ph codeph">STRUCT</code> field are stored adjacent to each other, so that they can be + encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even + when the <code class="ph codeph">STRUCT</code> values are part of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>. During a query, Impala avoids + unnecessary I/O by reading only the portions of the Parquet data file containing the requested <code class="ph codeph">STRUCT</code> fields. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Columns with this data type can only be used in tables or partitions with the Parquet file format. + </p> + </li> + <li class="li"> + <p class="p"> + Columns with this data type cannot be used as partition key columns in a partitioned table. + </p> + </li> + <li class="li"> + <p class="p"> + The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type. + </p> + </li> + <li class="li"> + <p class="p" id="struct__d6e2889"> + The maximum length of the column definition for any complex type, including declarations for any nested types, + is 4000 characters. + </p> + </li> + <li class="li"> + <p class="p"> + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations + and associated guidelines about complex type columns. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + <p class="p"> + Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, + <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Many of the complex type examples refer to tables + such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code> + adapted from the tables used in the TPC-H benchmark. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a> + for the table definitions. + </div> + + <p class="p"> + The following example shows a table with various kinds of <code class="ph codeph">STRUCT</code> columns, both at the top level and nested within + other complex types. Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns using empty tables, until + you can visualize a complex data structure and construct corresponding SQL statements reliably. + </p> + +<pre class="pre codeblock"><code>CREATE TABLE struct_demo +( + id BIGINT, + name STRING, + +-- A STRUCT as a top-level column. Demonstrates how the table ID column +-- and the ID field within the STRUCT can coexist without a name conflict. + employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >, + +-- A STRUCT as the element type of an ARRAY. + places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>, + +-- A STRUCT as the value portion of the key-value pairs in a MAP. + memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>, + +-- A STRUCT where one of the fields is another STRUCT. + current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING > +) +STORED AS PARQUET; + +</code></pre> + + <p class="p"> + The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">STRUCT</code> columns by using + the <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">STRUCT</code> as its own table, with columns named the same + as each field of the <code class="ph codeph">STRUCT</code>. If the <code class="ph codeph">STRUCT</code> is nested inside another complex type, such as + <code class="ph codeph">ARRAY</code>, you can extend the qualified name passed to <code class="ph codeph">DESCRIBE</code> until the output shows just the + <code class="ph codeph">STRUCT</code> fields. + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo; ++-------------------+--------------------------+ +| name | type | ++-------------------+--------------------------+ +| id | bigint | +| name | string | +| employee_info | struct< | +| | employer:string, | +| | id:bigint, | +| | address:string | +| | > | +| places_lived | array<struct< | +| | street:string, | +| | city:string, | +| | country:string | +| | >> | +| memorable_moments | map<string,struct< | +| | year:int, | +| | place:string, | +| | details:string | +| | >> | +| current_address | struct< | +| | street_address:struct< | +| | street_number:int, | +| | street_name:string, | +| | street_type:string | +| | >, | +| | country:string, | +| | postal_code:string | +| | > | ++-------------------+--------------------------+ + +</code></pre> + + <p class="p"> + The top-level column <code class="ph codeph">EMPLOYEE_INFO</code> is a <code class="ph codeph">STRUCT</code>. Describing + <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">struct_name</var></code> displays the fields of the <code class="ph codeph">STRUCT</code> as if + they were columns of a table: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.employee_info; ++----------+--------+ +| name | type | ++----------+--------+ +| employer | string | +| id | bigint | +| address | string | ++----------+--------+ + +</code></pre> + + <p class="p"> + Because <code class="ph codeph">PLACES_LIVED</code> is a <code class="ph codeph">STRUCT</code> inside an <code class="ph codeph">ARRAY</code>, the initial + <code class="ph codeph">DESCRIBE</code> shows the structure of the <code class="ph codeph">ARRAY</code>: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.places_lived; ++------+------------------+ +| name | type | ++------+------------------+ +| item | struct< | +| | street:string, | +| | city:string, | +| | country:string | +| | > | +| pos | bigint | ++------+------------------+ + +</code></pre> + + <p class="p"> + Ask for the details of the <code class="ph codeph">ITEM</code> field of the <code class="ph codeph">ARRAY</code> to see just the layout of the + <code class="ph codeph">STRUCT</code>: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.places_lived.item; ++---------+--------+ +| name | type | ++---------+--------+ +| street | string | +| city | string | +| country | string | ++---------+--------+ + +</code></pre> + + <p class="p"> + Likewise, <code class="ph codeph">MEMORABLE_MOMENTS</code> has a <code class="ph codeph">STRUCT</code> inside a <code class="ph codeph">MAP</code>, which requires an extra + level of qualified name to see just the <code class="ph codeph">STRUCT</code> part: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.memorable_moments; ++-------+------------------+ +| name | type | ++-------+------------------+ +| key | string | +| value | struct< | +| | year:int, | +| | place:string, | +| | details:string | +| | > | ++-------+------------------+ + +</code></pre> + + <p class="p"> + For a <code class="ph codeph">MAP</code>, ask to see the <code class="ph codeph">VALUE</code> field to see the corresponding <code class="ph codeph">STRUCT</code> fields in a + table-like structure: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.memorable_moments.value; ++---------+--------+ +| name | type | ++---------+--------+ +| year | int | +| place | string | +| details | string | ++---------+--------+ + +</code></pre> + + <p class="p"> + For a <code class="ph codeph">STRUCT</code> inside a <code class="ph codeph">STRUCT</code>, we can see the fields of the outer <code class="ph codeph">STRUCT</code>: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.current_address; ++----------------+-----------------------+ +| name | type | ++----------------+-----------------------+ +| street_address | struct< | +| | street_number:int, | +| | street_name:string, | +| | street_type:string | +| | > | +| country | string | +| postal_code | string | ++----------------+-----------------------+ + +</code></pre> + + <p class="p"> + Then we can use a further qualified name to see just the fields of the inner <code class="ph codeph">STRUCT</code>: + </p> + +<pre class="pre codeblock"><code>DESCRIBE struct_demo.current_address.street_address; ++---------------+--------+ +| name | type | ++---------------+--------+ +| street_number | int | +| street_name | string | +| street_type | string | ++---------------+--------+ + +</code></pre> + + <p class="p"> + The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">STRUCT</code> columns by using + the <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">STRUCT</code> as its own table, with columns named the same + as each field of the <code class="ph codeph">STRUCT</code>. If the <code class="ph codeph">STRUCT</code> is nested inside another complex type, such as + <code class="ph codeph">ARRAY</code>, you can extend the qualified name passed to <code class="ph codeph">DESCRIBE</code> until the output shows just the + <code class="ph codeph">STRUCT</code> fields. + </p> + + + +<pre class="pre codeblock"><code>DESCRIBE struct_demo; ++-------------------+--------------------------+---------+ +| name | type | comment | ++-------------------+--------------------------+---------+ +| id | bigint | | +| name | string | | +| employee_info | struct< | | +| | employer:string, | | +| | id:bigint, | | +| | address:string | | +| | > | | +| places_lived | array<struct< | | +| | street:string, | | +| | city:string, | | +| | country:string | | +| | >> | | +| memorable_moments | map<string,struct< | | +| | year:int, | | +| | place:string, | | +| | details:string | | +| | >> | | +| current_address | struct< | | +| | street_address:struct< | | +| | street_number:int, | | +| | street_name:string, | | +| | street_type:string | | +| | >, | | +| | country:string, | | +| | postal_code:string | | +| | > | | ++-------------------+--------------------------+---------+ + +SELECT id, employee_info.id FROM struct_demo; + +SELECT id, employee_info.id AS employee_id FROM struct_demo; + +SELECT id, employee_info.id AS employee_id, employee_info.employer + FROM struct_demo; + +SELECT id, name, street, city, country + FROM struct_demo, struct_demo.places_lived; + +SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country + FROM struct_demo, struct_demo.places_lived; + +SELECT id, name, pl.pos, pl.street, pl.city, pl.country + FROM struct_demo, struct_demo.places_lived AS pl; + +SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country + FROM struct_demo, struct_demo.places_lived; + +SELECT id, name, pos, street, city, country + FROM struct_demo, struct_demo.places_lived; + +SELECT id, name, memorable_moments.key, + memorable_moments.value.year, + memorable_moments.value.place, + memorable_moments.value.details +FROM struct_demo, struct_demo.memorable_moments +WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation'); + +SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details + FROM struct_demo, struct_demo.memorable_moments AS mm +WHERE mm.key IN ('Birthday','Anniversary','Graduation'); + +SELECT id, name, memorable_moments.key, memorable_moments.value.year, + memorable_moments.value.place, memorable_moments.value.details +FROM struct_demo, struct_demo.memorable_moments +WHERE key IN ('Birthday','Anniversary','Graduation'); + +SELECT id, name, key, value.year, value.place, value.details + FROM struct_demo, struct_demo.memorable_moments +WHERE key IN ('Birthday','Anniversary','Graduation'); + +SELECT id, name, key, year, place, details + FROM struct_demo, struct_demo.memorable_moments +WHERE key IN ('Birthday','Anniversary','Graduation'); + +SELECT id, name, + current_address.street_address.street_number, + current_address.street_address.street_name, + current_address.street_address.street_type, + current_address.country, + current_address.postal_code +FROM struct_demo; + +</code></pre> + + <p class="p"> + For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can + have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work, + mobile, and so on kinds of phone numbers. + </p> + +<pre class="pre codeblock"><code>CREATE TABLE contact_info_many_structs +( + id BIGINT, name STRING, + phone_numbers ARRAY < STRUCT <category:STRING, country_code:STRING, area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING > > +) STORED AS PARQUET; + +</code></pre> + + <p class="p"> + Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose + things such as addresses: + </p> + +<pre class="pre codeblock"><code>CREATE TABLE contact_info_detailed_address +( + id BIGINT, name STRING, + address STRUCT < house_number:INT, street:STRING, street_type:STRING, apartment:STRING, city:STRING, region:STRING, country:STRING > +); + +</code></pre> + + <p class="p"> + In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis + on each field independently. For example, which streets have the largest number range of addresses, what are the statistical + properties of the street names, which areas have a higher proportion of <span class="q">"Roads"</span>, <span class="q">"Courts"</span> or <span class="q">"Boulevards"</span>, and so on. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>, + + <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> + </p> + + </div> + +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_subqueries.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_subqueries.html b/docs/build/html/topics/impala_subqueries.html new file mode 100644 index 0000000..2be2880 --- /dev/null +++ b/docs/build/html/topics/impala_subqueries.html @@ -0,0 +1,316 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="subqueries"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Subqueries in Impala SELECT Statements</title></head><body id="subqueries"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Subqueries in Impala SELECT Statements</h1> + + + + <div class="body conbody"> + + <p class="p"> + + A <dfn class="term">subquery</dfn> is a query that is nested within another query. Subqueries let queries on one table + dynamically adapt based on the contents of another table. This technique provides great flexibility and + expressive power for SQL queries. + </p> + + <p class="p"> + A subquery can return a result set for use in the <code class="ph codeph">FROM</code> or <code class="ph codeph">WITH</code> clauses, or + with operators such as <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code>. + </p> + + <p class="p"> + A <dfn class="term">scalar subquery</dfn> produces a result set with a single row containing a single column, typically + produced by an aggregation function such as <code class="ph codeph">MAX()</code> or <code class="ph codeph">SUM()</code>. This single + result value can be substituted in scalar contexts such as arguments to comparison operators. If the result + set is empty, the value of the scalar subquery is <code class="ph codeph">NULL</code>. For example, the following query + finds the maximum value of <code class="ph codeph">T2.Y</code> and then substitutes that value into the + <code class="ph codeph">WHERE</code> clause of the outer block that queries <code class="ph codeph">T1</code>: + </p> + +<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE x > (SELECT MAX(y) FROM t2); +</code></pre> + + <p class="p"> + <dfn class="term">Uncorrelated subqueries</dfn> do not refer to any tables from the outer block of the query. The same + value or set of values produced by the subquery is used when evaluating each row from the outer query block. + In this example, the subquery returns an arbitrary number of values from <code class="ph codeph">T2.Y</code>, and each + value of <code class="ph codeph">T1.X</code> is tested for membership in that same set of values: + </p> + +<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE x IN (SELECT y FROM t2); +</code></pre> + + <p class="p"> + <dfn class="term">Correlated subqueries</dfn> compare one or more values from the outer query block to values referenced + in the <code class="ph codeph">WHERE</code> clause of the subquery. Each row evaluated by the outer <code class="ph codeph">WHERE</code> + clause can be evaluated using a different set of values. These kinds of subqueries are restricted in the + kinds of comparisons they can do between columns of the inner and outer tables. (See the following + <strong class="ph b">Restrictions</strong> item.) + </p> + + <p class="p"> + For example, the following query finds all the employees with salaries that are higher than average for their + department. The subquery potentially computes a different <code class="ph codeph">AVG()</code> value for each employee. + </p> + + + +<pre class="pre codeblock"><code>SELECT employee_name, employee_id FROM employees one WHERE + salary > (SELECT avg(salary) FROM employees two WHERE one.dept_id = two.dept_id); +</code></pre> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + <strong class="ph b">Subquery in the <code class="ph codeph">FROM</code> clause:</strong> + </p> + +<pre class="pre codeblock"><code>SELECT <var class="keyword varname">select_list</var> FROM <var class="keyword varname">table_ref</var> [, <var class="keyword varname">table_ref</var> ...] + +<var class="keyword varname">table_ref</var> ::= <var class="keyword varname">table_name</var> | (<var class="keyword varname">select_statement</var>) +</code></pre> + + <p class="p"> + <strong class="ph b">Subqueries in <code class="ph codeph">WHERE</code> clause:</strong> + </p> + +<pre class="pre codeblock"><code>WHERE <var class="keyword varname">value</var> <var class="keyword varname">comparison_operator</var> (<var class="keyword varname">scalar_select_statement</var>) +WHERE <var class="keyword varname">value</var> [NOT] IN (<var class="keyword varname">select_statement</var>) +WHERE [NOT] EXISTS (<var class="keyword varname">correlated_select_statement</var>) +WHERE NOT EXISTS (<var class="keyword varname">correlated_select_statement</var>) +</code></pre> + + <p class="p"> + <code class="ph codeph">comparison_operator</code> is a numeric comparison such as <code class="ph codeph">=</code>, + <code class="ph codeph"><=</code>, <code class="ph codeph">!=</code>, and so on, or a string comparison operator such as + <code class="ph codeph">LIKE</code> or <code class="ph codeph">REGEXP</code>. + </p> + + <p class="p"> + Although you can use non-equality comparison operators such as <code class="ph codeph"><</code> or + <code class="ph codeph">>=</code>, the subquery must include at least one equality comparison between the columns of the + inner and outer query blocks. + </p> + + <p class="p"> + All syntax is available for both correlated and uncorrelated queries, except that the <code class="ph codeph">NOT + EXISTS</code> clause cannot be used with an uncorrelated subquery. + </p> + + <p class="p"> + Impala subqueries can be nested arbitrarily deep. + </p> + + <p class="p"> + <strong class="ph b">Standards compliance:</strong> Introduced in + <a class="xref" href="http://en.wikipedia.org/wiki/SQL:1999" target="_blank">SQL:1999</a>. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + This example illustrates how subqueries can be used in the <code class="ph codeph">FROM</code> clause to organize the table + names, column names, and column values by producing intermediate result sets, especially for join queries. + </p> + +<pre class="pre codeblock"><code>SELECT avg(t1.x), max(t2.y) FROM + (SELECT id, cast(a AS DECIMAL(10,5)) AS x FROM raw_data WHERE a BETWEEN 0 AND 100) AS t1 + JOIN + (SELECT id, length(s) AS y FROM raw_data WHERE s LIKE 'A%') AS t2; + USING (id); +</code></pre> + + <p class="p"> + These examples show how a query can test for the existence of values in a separate table using the + <code class="ph codeph">EXISTS()</code> operator with a subquery. + + </p> + + <p class="p"> + The following examples show how a value can be compared against a set of values returned by a subquery. + </p> + +<pre class="pre codeblock"><code>SELECT count(x) FROM t1 WHERE EXISTS(SELECT 1 FROM t2 WHERE t1.x = t2.y * 10); + +SELECT x FROM t1 WHERE x IN (SELECT y FROM t2 WHERE state = 'CA'); +</code></pre> + + <p class="p"> + The following examples demonstrate scalar subqueries. When a subquery is known to return a single value, you + can substitute it where you would normally put a constant value. + </p> + +<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE y = (SELECT max(z) FROM t2); +SELECT x FROM t1 WHERE y > (SELECT count(z) FROM t2); +</code></pre> + + + + + + + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + If the same table is referenced in both the outer and inner query blocks, construct a table alias in the + outer query block and use a fully qualified name to distinguish the inner and outer table references: + </p> + + + +<pre class="pre codeblock"><code>SELECT * FROM t1 one WHERE id IN (SELECT parent FROM t1 two WHERE t1.parent = t2.id); +</code></pre> + + <p class="p"> + <strong class="ph b">Internal details:</strong> + </p> + + <p class="p"> + Internally, subqueries involving <code class="ph codeph">IN</code>, <code class="ph codeph">NOT IN</code>, <code class="ph codeph">EXISTS</code>, or + <code class="ph codeph">NOT EXISTS</code> clauses are rewritten into join queries. Depending on the syntax, the subquery + might be rewritten to an outer join, semi join, cross join, or anti join. + </p> + + <p class="p"> + A query is processed differently depending on whether the subquery calls any aggregation functions. There are + correlated and uncorrelated forms, with and without calls to aggregation functions. Each of these four + categories is rewritten differently. + </p> + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> + </p> + + <p class="p"> + Because queries that include correlated and uncorrelated subqueries in the <code class="ph codeph">WHERE</code> clause are + written into join queries, to achieve best performance, follow the same guidelines for running the + <code class="ph codeph">COMPUTE STATS</code> statement as you do for tables involved in regular join queries. Run the + <code class="ph codeph">COMPUTE STATS</code> statement for each associated tables after loading or substantially changing + the data in that table. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> Subqueries are substantially enhanced starting in Impala 2.0. Now, + they can be used in the <code class="ph codeph">WHERE</code> clause, in combination with clauses such as + <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code>, rather than just in the <code class="ph codeph">FROM</code> clause. + </p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <p class="p"> + The initial Impala support for nested subqueries addresses the most common use cases. Some restrictions + remain: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Although you can use subqueries in a query involving <code class="ph codeph">UNION</code> or <code class="ph codeph">UNION ALL</code> + in Impala 2.1.0 and higher, currently you cannot construct a union of two subqueries (for example, in the + argument of an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> operator). + </p> + </li> + + <li class="li"> + <p class="p"> + Subqueries returning scalar values cannot be used with the operators <code class="ph codeph">ANY</code> or + <code class="ph codeph">ALL</code>. (Impala does not currently have a <code class="ph codeph">SOME</code> operator, but if it did, + the same restriction would apply.) + </p> + </li> + + <li class="li"> + <p class="p"> + For the <code class="ph codeph">EXISTS</code> and <code class="ph codeph">NOT EXISTS</code> clauses, any subquery comparing values + from the outer query block to another table must use at least one equality comparison, not exclusively + other kinds of comparisons such as less than, greater than, <code class="ph codeph">BETWEEN</code>, or + <code class="ph codeph">!=</code>. + </p> + </li> + + <li class="li"> + + <p class="p"> + Currently, a scalar subquery cannot be used as the first or second argument to the + <code class="ph codeph">BETWEEN</code> operator. + </p> + </li> + + <li class="li"> + <p class="p"> + A subquery cannot be used inside an <code class="ph codeph">OR</code> conjunction. Expressions inside a subquery, for + example in the <code class="ph codeph">WHERE</code> clause, can use <code class="ph codeph">OR</code> conjunctions; the restriction + only applies to parts of the query <span class="q">"above"</span> the subquery. + </p> + </li> + + <li class="li"> + <p class="p"> + Scalar subqueries are only supported in numeric contexts. You cannot use a scalar subquery as an argument + to the <code class="ph codeph">LIKE</code>, <code class="ph codeph">REGEXP</code>, or <code class="ph codeph">RLIKE</code> operators, or compare it + to a value of a non-numeric type such as <code class="ph codeph">TIMESTAMP</code> or <code class="ph codeph">BOOLEAN</code>. + </p> + </li> + + <li class="li"> + <p class="p"> + + You cannot use subqueries with the <code class="ph codeph">CASE</code> function to generate the comparison value, the + values to be compared against, or the return value. + </p> + </li> + + <li class="li"> + <p class="p"> + A subquery is not allowed in the filter condition for the <code class="ph codeph">HAVING</code> clause. (Strictly + speaking, a subquery cannot appear anywhere outside the <code class="ph codeph">WITH</code>, <code class="ph codeph">FROM</code>, and + <code class="ph codeph">WHERE</code> clauses.) + </p> + </li> + + <li class="li"> + <p class="p"> + You must use a fully qualified name + (<code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">column_name</var></code> or + <code class="ph codeph"><var class="keyword varname">database_name</var>.<var class="keyword varname">table_name</var>.<var class="keyword varname">column_name</var></code>) + when referring to any column from the outer query block within a subquery. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + For the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>) + available in <span class="keyword">Impala 2.3</span> and higher, the join queries that <span class="q">"unpack"</span> complex type + columns often use correlated subqueries in the <code class="ph codeph">FROM</code> clause. + For example, if the first table in the join clause is <code class="ph codeph">CUSTOMER</code>, the second + join clause might have a subquery that selects from the column <code class="ph codeph">CUSTOMER.C_ORDERS</code>, + which is an <code class="ph codeph">ARRAY</code>. The subquery re-evaluates the <code class="ph codeph">ARRAY</code> elements + corresponding to each row from the <code class="ph codeph">CUSTOMER</code> table. + See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples of + using subqueries with complex types. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_operators.html#exists">EXISTS Operator</a>, <a class="xref" href="impala_operators.html#in">IN Operator</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_sum.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_sum.html b/docs/build/html/topics/impala_sum.html new file mode 100644 index 0000000..95ac90e --- /dev/null +++ b/docs/build/html/topics/impala_sum.html @@ -0,0 +1,333 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="sum"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SUM Function</title></head><body id="sum"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SUM Function</h1> + + + + <div class="body conbody"> + + <p class="p"> + + An aggregate function that returns the sum of a set of numbers. Its single argument can be numeric column, or + the numeric result of a function or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> + value for the specified column are ignored. If the table is empty, or all the values supplied to + <code class="ph codeph">MIN</code> are <code class="ph codeph">NULL</code>, <code class="ph codeph">SUM</code> returns <code class="ph codeph">NULL</code>. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>SUM([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre> + + <p class="p"> + When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of + grouping values. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code> for integer arguments, <code class="ph codeph">DOUBLE</code> for floating-point + arguments + </p> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>) + in an aggregation function, you unpack the individual elements using join notation in the query, + and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala. + </p> + + <div class="p"> +The following example demonstrates calls to several aggregation functions +using values from a column containing nested complex types +(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items). +The array is unpacked inside the query using join notation. +The array elements are referenced using the <code class="ph codeph">ITEM</code> +pseudocolumn, and the structure fields inside the array elements +are referenced using dot notation. +Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code> +are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and +the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> +values are computed from the string <code class="ph codeph">N_NAME</code> field. +<pre class="pre codeblock"><code>describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +select r_name, r_nations.item.n_nationkey + from region, region.r_nations as r_nations +order by r_name, r_nations.item.n_nationkey; ++-------------+------------------+ +| r_name | item.n_nationkey | ++-------------+------------------+ +| AFRICA | 0 | +| AFRICA | 5 | +| AFRICA | 14 | +| AFRICA | 15 | +| AFRICA | 16 | +| AMERICA | 1 | +| AMERICA | 2 | +| AMERICA | 3 | +| AMERICA | 17 | +| AMERICA | 24 | +| ASIA | 8 | +| ASIA | 9 | +| ASIA | 12 | +| ASIA | 18 | +| ASIA | 21 | +| EUROPE | 6 | +| EUROPE | 7 | +| EUROPE | 19 | +| EUROPE | 22 | +| EUROPE | 23 | +| MIDDLE EAST | 4 | +| MIDDLE EAST | 10 | +| MIDDLE EAST | 11 | +| MIDDLE EAST | 13 | +| MIDDLE EAST | 20 | ++-------------+------------------+ + +select + r_name, + count(r_nations.item.n_nationkey) as count, + sum(r_nations.item.n_nationkey) as sum, + avg(r_nations.item.n_nationkey) as avg, + min(r_nations.item.n_name) as minimum, + max(r_nations.item.n_name) as maximum, + ndv(r_nations.item.n_nationkey) as distinct_vals +from + region, region.r_nations as r_nations +group by r_name +order by r_name; ++-------------+-------+-----+------+-----------+----------------+---------------+ +| r_name | count | sum | avg | minimum | maximum | distinct_vals | ++-------------+-------+-----+------+-----------+----------------+---------------+ +| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 | +| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 | +| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 | +| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 | +| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 | ++-------------+-------+-----+------+-----------+----------------+---------------+ +</code></pre> +</div> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example shows how to use <code class="ph codeph">SUM()</code> to compute the total for all the values in the + table, a subset of values, or the sum for each combination of values in the <code class="ph codeph">GROUP BY</code> clause: + </p> + +<pre class="pre codeblock"><code>-- Total all the values for this column in the table. +select sum(c1) from t1; +-- Find the total for this column from a subset of the table. +select sum(c1) from t1 where month = 'January' and year = '2013'; +-- Find the total from a set of numeric function results. +select sum(length(s)) from t1; +-- Often used with functions that return predefined values to compute a score. +select sum(case when grade = 'A' then 1.0 when grade = 'B' then 0.75 else 0) as class_honors from test_scores; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Return more than one result. +select month, year, sum(purchase_price) from store_stats group by month, year; +-- Filter the input to eliminate duplicates before performing the calculation. +select sum(distinct x) from t1; +</code></pre> + + <div class="p"> + The following examples show how to use <code class="ph codeph">SUM()</code> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <code class="ph codeph">SUM()</code> is reported for each input value, as + opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set. +<pre class="pre codeblock"><code>select x, property, sum(x) <strong class="ph b">over (partition by property)</strong> as sum from int_t where property in ('odd','even'); ++----+----------+-----+ +| x | property | sum | ++----+----------+-----+ +| 2 | even | 30 | +| 4 | even | 30 | +| 6 | even | 30 | +| 8 | even | 30 | +| 10 | even | 30 | +| 1 | odd | 25 | +| 3 | odd | 25 | +| 5 | odd | 25 | +| 7 | odd | 25 | +| 9 | odd | 25 | ++----+----------+-----+ +</code></pre> + +Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">SUM()</code> in an analytic context +(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running total of all the even values, +then a running total of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly +activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +therefore all of these examples produce the same results: +<pre class="pre codeblock"><code>select x, property, + sum(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative total' + from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative total | ++----+----------+------------------+ +| 2 | even | 2 | +| 4 | even | 6 | +| 6 | even | 12 | +| 8 | even | 20 | +| 10 | even | 30 | +| 1 | odd | 1 | +| 3 | odd | 4 | +| 5 | odd | 9 | +| 7 | odd | 16 | +| 9 | odd | 25 | ++----+----------+------------------+ + +select x, property, + sum(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">range between unbounded preceding and current row</strong> + ) as 'cumulative total' +from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative total | ++----+----------+------------------+ +| 2 | even | 2 | +| 4 | even | 6 | +| 6 | even | 12 | +| 8 | even | 20 | +| 10 | even | 30 | +| 1 | odd | 1 | +| 3 | odd | 4 | +| 5 | odd | 9 | +| 7 | odd | 16 | +| 9 | odd | 25 | ++----+----------+------------------+ + +select x, property, + sum(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">rows between unbounded preceding and current row</strong> + ) as 'cumulative total' + from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative total | ++----+----------+------------------+ +| 2 | even | 2 | +| 4 | even | 6 | +| 6 | even | 12 | +| 8 | even | 20 | +| 10 | even | 30 | +| 1 | odd | 1 | +| 3 | odd | 4 | +| 5 | odd | 9 | +| 7 | odd | 16 | +| 9 | odd | 25 | ++----+----------+------------------+ +</code></pre> + +Changing the direction of the <code class="ph codeph">ORDER BY</code> clause causes the intermediate +results of the cumulative total to be calculated in a different order: + +<pre class="pre codeblock"><code>select sum(x) over (partition by property <strong class="ph b">order by x desc</strong>) as 'cumulative total' + from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative total | ++----+----------+------------------+ +| 10 | even | 10 | +| 8 | even | 18 | +| 6 | even | 24 | +| 4 | even | 28 | +| 2 | even | 30 | +| 9 | odd | 9 | +| 7 | odd | 16 | +| 5 | odd | 21 | +| 3 | odd | 24 | +| 1 | odd | 25 | ++----+----------+------------------+ +</code></pre> + +The following examples show how to construct a moving window, with a running total taking into account 1 row before +and 1 row after the current row, within the same partition (all the even values or all the odd values). +Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of +moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> +clause: +<pre class="pre codeblock"><code>select x, property, + sum(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">rows between 1 preceding and 1 following</strong> + ) as 'moving total' + from int_t where property in ('odd','even'); ++----+----------+--------------+ +| x | property | moving total | ++----+----------+--------------+ +| 2 | even | 6 | +| 4 | even | 12 | +| 6 | even | 18 | +| 8 | even | 24 | +| 10 | even | 18 | +| 1 | odd | 4 | +| 3 | odd | 9 | +| 5 | odd | 15 | +| 7 | odd | 21 | +| 9 | odd | 16 | ++----+----------+--------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + sum(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">range between 1 preceding and 1 following</strong> + ) as 'moving total' +from int_t where property in ('odd','even'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</code></pre> + </div> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + + + <p class="p"> + Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses + high-performance hardware instructions, and distributed queries can perform these operations in different + order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code> + and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on + large data sets where millions or billions of values are summed or averaged. For perfect consistency and + repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of + <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_support_start_over.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_support_start_over.html b/docs/build/html/topics/impala_support_start_over.html new file mode 100644 index 0000000..f813773 --- /dev/null +++ b/docs/build/html/topics/impala_support_start_over.html @@ -0,0 +1,30 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="support_start_over"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SUPPORT_START_OVER Query Option</title></head><body id="support_start_over"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SUPPORT_START_OVER Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Leave this setting at its default value. + It is a read-only setting, tested by some client applications such as Hue. + </p> + <p class="p"> + If you accidentally change it through <span class="keyword cmdname">impala-shell</span>, + subsequent queries encounter errors until you undo the change + by issuing <code class="ph codeph">UNSET support_start_over</code>. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_sync_ddl.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_sync_ddl.html b/docs/build/html/topics/impala_sync_ddl.html new file mode 100644 index 0000000..b19e266 --- /dev/null +++ b/docs/build/html/topics/impala_sync_ddl.html @@ -0,0 +1,55 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="sync_ddl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SYNC_DDL Query Option</title></head><body id="sync_ddl"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SYNC_DDL Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + When enabled, causes any DDL operation such as <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> + to return only when the changes have been propagated to all other Impala nodes in the cluster by the Impala + catalog service. That way, if you issue a subsequent <code class="ph codeph">CONNECT</code> statement in + <span class="keyword cmdname">impala-shell</span> to connect to a different node in the cluster, you can be sure that other + node will already recognize any added or changed tables. (The catalog service automatically broadcasts the + DDL changes to all nodes automatically, but without this option there could be a period of inconsistency if + you quickly switched to another node, such as by issuing a subsequent query through a load-balancing proxy.) + </p> + + <p class="p"> + Although <code class="ph codeph">INSERT</code> is classified as a DML statement, when the <code class="ph codeph">SYNC_DDL</code> option + is enabled, <code class="ph codeph">INSERT</code> statements also delay their completion until all the underlying data and + metadata changes are propagated to all Impala nodes. Internally, Impala inserts have similarities with DDL + statements in traditional database systems, because they create metadata needed to track HDFS block locations + for new files and they potentially add new partitions to partitioned tables. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Because this option can introduce a delay after each write operation, if you are running a sequence of + <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, + <code class="ph codeph">INSERT</code>, and similar statements within a setup script, to minimize the overall delay you can + enable the <code class="ph codeph">SYNC_DDL</code> query option only near the end, before the final DDL statement. + </div> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + + + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
