http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_describe.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_describe.html b/docs/build/html/topics/impala_describe.html new file mode 100644 index 0000000..963ef6e --- /dev/null +++ b/docs/build/html/topics/impala_describe.html @@ -0,0 +1,802 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="describe"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DESCRIBE Statement</title></head><body id="describe"><main role="main"><article role="article" aria-labelledby="describe__desc"> + + <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + + The <code class="ph codeph">DESCRIBE</code> statement displays metadata about a table, such as the column names and their + data types. + <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify the name of a complex type column, which takes + the form of a dotted path. The path might include multiple components in the case of a nested type definition.</span> + <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">DESCRIBE DATABASE</code> form can display + information about a database.</span> + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var class="keyword varname">object_name</var> + +object_name ::= + [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> ...] + | <var class="keyword varname">db_name</var> +</code></pre> + + <p class="p"> + You can use the abbreviation <code class="ph codeph">DESC</code> for the <code class="ph codeph">DESCRIBE</code> statement. + </p> + + <p class="p"> + The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays additional information, in a format familiar to + users of Apache Hive. The extra information includes low-level details such as whether the table is internal + or external, when it was created, the file format, the location of the data in HDFS, whether the object is a + table or a view, and (for views) the text of the query from the view definition. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + The <code class="ph codeph">Compressed</code> field is not a reliable indicator of whether the table contains compressed + data. It typically always shows <code class="ph codeph">No</code>, because the compression settings only apply during the + session that loads data and are not stored persistently with the table metadata. + </div> + +<p class="p"> + <strong class="ph b">Describing databases:</strong> +</p> + +<p class="p"> + By default, the <code class="ph codeph">DESCRIBE</code> output for a database includes the location + and the comment, which can be set by the <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code> + clauses on the <code class="ph codeph">CREATE DATABASE</code> statement. +</p> + +<p class="p"> + The additional information displayed by the <code class="ph codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code> + keyword includes the HDFS user ID that is considered the owner of the database, and any + optional database properties. The properties could be specified by the <code class="ph codeph">WITH DBPROPERTIES</code> + clause if the database is created using a Hive <code class="ph codeph">CREATE DATABASE</code> statement. + Impala currently does not set or do any special processing based on those properties. +</p> + +<p class="p"> +The following examples show the variations in syntax and output for +describing databases. This feature is available in <span class="keyword">Impala 2.5</span> +and higher. +</p> + +<pre class="pre codeblock"><code> +describe database default; ++---------+----------------------+-----------------------+ +| name | location | comment | ++---------+----------------------+-----------------------+ +| default | /user/hive/warehouse | Default Hive database | ++---------+----------------------+-----------------------+ + +describe database formatted default; ++---------+----------------------+-----------------------+ +| name | location | comment | ++---------+----------------------+-----------------------+ +| default | /user/hive/warehouse | Default Hive database | +| Owner: | | | +| | public | ROLE | ++---------+----------------------+-----------------------+ + +describe database extended default; ++---------+----------------------+-----------------------+ +| name | location | comment | ++---------+----------------------+-----------------------+ +| default | /user/hive/warehouse | Default Hive database | +| Owner: | | | +| | public | ROLE | ++---------+----------------------+-----------------------+ +</code></pre> + +<p class="p"> + <strong class="ph b">Describing tables:</strong> +</p> + +<p class="p"> + If the <code class="ph codeph">DATABASE</code> keyword is omitted, the default + for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a table. +</p> + +<pre class="pre codeblock"><code> +-- By default, the table is assumed to be in the current database. +describe my_table; ++------+--------+---------+ +| name | type | comment | ++------+--------+---------+ +| x | int | | +| s | string | | ++------+--------+---------+ + +-- Use a fully qualified table name to specify a table in any database. +describe my_database.my_table; ++------+--------+---------+ +| name | type | comment | ++------+--------+---------+ +| x | int | | +| s | string | | ++------+--------+---------+ + +-- The formatted or extended output includes additional useful information. +-- The LOCATION field is especially useful to know for DDL statements and HDFS commands +-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here for readability.) +describe formatted my_table; ++------------------------------+----------------------------------------------+----------------------+ +| name | type | comment | ++------------------------------+----------------------------------------------+----------------------+ +| # col_name | data_type | comment | +| | NULL | NULL | +| x | int | NULL | +| s | string | NULL | +| | NULL | NULL | +| # Detailed Table Information | NULL | NULL | +| Database: | my_database | NULL | +| Owner: | jrussell | NULL | +| CreateTime: | Fri Mar 18 15:58:00 PDT 2016 | NULL | +| LastAccessTime: | UNKNOWN | NULL | +| Protect Mode: | None | NULL | +| Retention: | 0 | NULL | +| Location: | /user/hive/warehouse/my_database.db/my_table | NULL | +| Table Type: | MANAGED_TABLE | NULL | +| Table Parameters: | NULL | NULL | +| | transient_lastDdlTime | 1458341880 | +| | NULL | NULL | +| # Storage Information | NULL | NULL | +| SerDe Library: | org. ... .LazySimpleSerDe | NULL | +| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | +| OutputFormat: | org. ... .HiveIgnoreKeyTextOutputFormat | NULL | +| Compressed: | No | NULL | +| Num Buckets: | 0 | NULL | +| Bucket Columns: | [] | NULL | +| Sort Columns: | [] | NULL | ++------------------------------+----------------------------------------------+----------------------+ +</code></pre> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + Because the column definitions for complex types can become long, particularly when such types are nested, + the <code class="ph codeph">DESCRIBE</code> statement uses special formatting for complex type columns to make the output readable. + </p> + + <p class="p"> + For the <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in + <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">DESCRIBE</code> output is formatted to avoid + excessively long lines for multiple fields within a <code class="ph codeph">STRUCT</code>, or a nested sequence of + complex types. + </p> + + <p class="p"> + You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code> + to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code> + column and visualize its structure as if it were a table. + For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column + <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>. + If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>, + and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>, + you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>. + An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with + <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. + A <code class="ph codeph">STRUCT</code> is shown as a table with each field + representing a column in the table. + A <code class="ph codeph">MAP</code> is shown as a two-column table, with + <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns. + </p> + + <p class="p"> + For example, here is the <code class="ph codeph">DESCRIBE</code> output for a table containing a single top-level column + of each complex type: + </p> + +<pre class="pre codeblock"><code>create table t1 (x int, a array<int>, s struct<f1: string, f2: bigint>, m map<string,int>) stored as parquet; + +describe t1; ++------+-----------------+---------+ +| name | type | comment | ++------+-----------------+---------+ +| x | int | | +| a | array<int> | | +| s | struct< | | +| | f1:string, | | +| | f2:bigint | | +| | > | | +| m | map<string,int> | | ++------+-----------------+---------+ + +</code></pre> + + <p class="p"> + Here are examples showing how to <span class="q">"drill down"</span> into the layouts of complex types, including + using multi-part names to examine the definitions of nested types. + The <code class="ph codeph">< ></code> delimiters identify the columns with complex types; + these are the columns where you can descend another level to see the parts that make up + the complex type. + This technique helps you to understand the multi-part names you use as table references in queries + involving complex types, and the corresponding column names you refer to in the <code class="ph codeph">SELECT</code> list. + These tables are from the <span class="q">"nested TPC-H"</span> schema, shown in detail in + <a class="xref" href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>. + </p> + + <p class="p"> + The <code class="ph codeph">REGION</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> + elements: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + The first <code class="ph codeph">DESCRIBE</code> specifies the table name, to display the definition + of each top-level column. + </p> + </li> + <li class="li"> + <p class="p"> + The second <code class="ph codeph">DESCRIBE</code> specifies the name of a complex + column, <code class="ph codeph">REGION.R_NATIONS</code>, showing that when you include the name of an <code class="ph codeph">ARRAY</code> + column in a <code class="ph codeph">FROM</code> clause, that table reference acts like a two-column table with + columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. + </p> + </li> + <li class="li"> + <p class="p"> + The final <code class="ph codeph">DESCRIBE</code> specifies the fully qualified name of the <code class="ph codeph">ITEM</code> field, + to display the layout of its underlying <code class="ph codeph">STRUCT</code> type in table format, with the fields + mapped to column names. + </p> + </li> + </ul> + +<pre class="pre codeblock"><code> +-- #1: The overall layout of the entire table. +describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +-- #2: The ARRAY column within the table. +describe region.r_nations; ++------+-------------------------+---------+ +| name | type | comment | ++------+-------------------------+---------+ +| item | struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | > | | +| pos | bigint | | ++------+-------------------------+---------+ + +-- #3: The STRUCT that makes up each ARRAY element. +-- The fields of the STRUCT act like columns of a table. +describe region.r_nations.item; ++-------------+----------+---------+ +| name | type | comment | ++-------------+----------+---------+ +| n_nationkey | smallint | | +| n_name | string | | +| n_comment | string | | ++-------------+----------+---------+ + +</code></pre> + + <p class="p"> + The <code class="ph codeph">CUSTOMER</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> + elements, where one field in the <code class="ph codeph">STRUCT</code> is another <code class="ph codeph">ARRAY</code> of + <code class="ph codeph">STRUCT</code> elements: + </p> + <ul class="ul"> + <li class="li"> + <p class="p"> + Again, the initial <code class="ph codeph">DESCRIBE</code> specifies only the table name. + </p> + </li> + <li class="li"> + <p class="p"> + The second <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the complex + column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing how an <code class="ph codeph">ARRAY</code> + is represented as a two-column table with columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. + </p> + </li> + <li class="li"> + <p class="p"> + The third <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the <code class="ph codeph">ITEM</code> + of the <code class="ph codeph">ARRAY</code> column, to see the structure of the nested <code class="ph codeph">ARRAY</code>. + Again, it has has two parts, <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. Because the + <code class="ph codeph">ARRAY</code> contains a <code class="ph codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code> + is shown. + </p> + </li> + <li class="li"> + <p class="p"> + The fourth and fifth <code class="ph codeph">DESCRIBE</code> statements drill down into a <code class="ph codeph">STRUCT</code> field that + is itself a complex type, an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>. + The <code class="ph codeph">ITEM</code> portion of the qualified name is only required when the <code class="ph codeph">ARRAY</code> + elements are anonymous. The fields of the <code class="ph codeph">STRUCT</code> give names to any other complex types + nested inside the <code class="ph codeph">STRUCT</code>. Therefore, the <code class="ph codeph">DESCRIBE</code> parameters + <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code> + are equivalent. (For brevity, leave out the <code class="ph codeph">ITEM</code> portion of + a qualified name when it is not required.) + </p> + </li> + <li class="li"> + <p class="p"> + The final <code class="ph codeph">DESCRIBE</code> shows the layout of the deeply nested <code class="ph codeph">STRUCT</code> type. + Because there are no more complex types nested inside this <code class="ph codeph">STRUCT</code>, this is as far + as you can drill down into the layout for this table. + </p> + </li> + </ul> + +<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table. +describe customer; ++--------------+------------------------------------+ +| name | type | ++--------------+------------------------------------+ +| c_custkey | bigint | +... more scalar columns ... +| c_orders | array<struct< | +| | o_orderkey:bigint, | +| | o_orderstatus:string, | +| | o_totalprice:decimal(12,2), | +| | o_orderdate:string, | +| | o_orderpriority:string, | +| | o_clerk:string, | +| | o_shippriority:int, | +| | o_comment:string, | +| | o_lineitems:array<struct< | +| | l_partkey:bigint, | +| | l_suppkey:bigint, | +| | l_linenumber:int, | +| | l_quantity:decimal(12,2), | +| | l_extendedprice:decimal(12,2), | +| | l_discount:decimal(12,2), | +| | l_tax:decimal(12,2), | +| | l_returnflag:string, | +| | l_linestatus:string, | +| | l_shipdate:string, | +| | l_commitdate:string, | +| | l_receiptdate:string, | +| | l_shipinstruct:string, | +| | l_shipmode:string, | +| | l_comment:string | +| | >> | +| | >> | ++--------------+------------------------------------+ + +-- #2: The ARRAY column within the table. +describe customer.c_orders; ++------+------------------------------------+ +| name | type | ++------+------------------------------------+ +| item | struct< | +| | o_orderkey:bigint, | +| | o_orderstatus:string, | +... more struct fields ... +| | o_lineitems:array<struct< | +| | l_partkey:bigint, | +| | l_suppkey:bigint, | +... more nested struct fields ... +| | l_comment:string | +| | >> | +| | > | +| pos | bigint | ++------+------------------------------------+ + +-- #3: The STRUCT that makes up each ARRAY element. +-- The fields of the STRUCT act like columns of a table. +describe customer.c_orders.item; ++-----------------+----------------------------------+ +| name | type | ++-----------------+----------------------------------+ +| o_orderkey | bigint | +| o_orderstatus | string | +| o_totalprice | decimal(12,2) | +| o_orderdate | string | +| o_orderpriority | string | +| o_clerk | string | +| o_shippriority | int | +| o_comment | string | +| o_lineitems | array<struct< | +| | l_partkey:bigint, | +| | l_suppkey:bigint, | +... more struct fields ... +| | l_comment:string | +| | >> | ++-----------------+----------------------------------+ + +-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY. +describe customer.c_orders.item.o_lineitems; ++------+----------------------------------+ +| name | type | ++------+----------------------------------+ +| item | struct< | +| | l_partkey:bigint, | +| | l_suppkey:bigint, | +... more struct fields ... +| | l_comment:string | +| | > | +| pos | bigint | ++------+----------------------------------+ + +-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the name +-- because O_LINEITEMS and other field names provide a way to refer to things +-- inside the ARRAY element. +describe customer.c_orders.o_lineitems; ++------+----------------------------------+ +| name | type | ++------+----------------------------------+ +| item | struct< | +| | l_partkey:bigint, | +| | l_suppkey:bigint, | +... more struct fields ... +| | l_comment:string | +| | > | +| pos | bigint | ++------+----------------------------------+ + +-- #6: The STRUCT representing ARRAY elements nested inside +-- another ARRAY of STRUCTs. The lack of any complex types +-- in this output means this is as far as DESCRIBE can +-- descend into the table layout. +describe customer.c_orders.o_lineitems.item; ++-----------------+---------------+ +| name | type | ++-----------------+---------------+ +| l_partkey | bigint | +| l_suppkey | bigint | +... more scalar columns ... +| l_comment | string | ++-----------------+---------------+ + +</code></pre> + +<p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + +<p class="p"> + After the <span class="keyword cmdname">impalad</span> daemons are restarted, the first query against a table can take longer + than subsequent queries, because the metadata for the table is loaded before the query is processed. This + one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern. + To <span class="q">"warm up"</span> the Impala metadata cache, you can issue a <code class="ph codeph">DESCRIBE</code> statement in advance + for each table you intend to access later. +</p> + +<p class="p"> + When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the + path of the data files for an Impala table, and the hostname for the namenode. You can get this information + from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You specify HDFS URIs or path specifications with + statements such as <code class="ph codeph">LOAD DATA</code> and the <code class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE + TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also use HDFS URIs or paths with Linux commands + such as <span class="keyword cmdname">hadoop</span> and <span class="keyword cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS. +</p> + +<p class="p"> + If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for + load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL + statement wait before returning, until the new or changed metadata has been received by all the Impala + nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details. + </p> + +<p class="p"> + Each table can also have associated table statistics and column statistics. To see these categories of + information, use the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and <code class="ph codeph">SHOW COLUMN + STATS <var class="keyword varname">table_name</var></code> statements. + + See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details. +</p> + +<div class="note important note_important"><span class="note__title importanttitle">Important:</span> + After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE + STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a + table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS + SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH + <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that + are very large, used in join queries, or both. + </div> + +<p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<p class="p"> + The following example shows the results of both a standard <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE + FORMATTED</code> for different kinds of schema objects: +</p> + + <ul class="ul"> + <li class="li"> + <code class="ph codeph">DESCRIBE</code> for a table or a view returns the name, type, and comment for each of the + columns. For a view, if the column value is computed by an expression, the column name is automatically + generated as <code class="ph codeph">_c0</code>, <code class="ph codeph">_c1</code>, and so on depending on the ordinal number of the + column. + </li> + + <li class="li"> + A table created with no special format or storage clauses is designated as a <code class="ph codeph">MANAGED_TABLE</code> + (an <span class="q">"internal table"</span> in Impala terminology). Its data files are stored in an HDFS directory under the + default Hive data directory. By default, it uses Text data format. + </li> + + <li class="li"> + A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in <code class="ph codeph">DESCRIBE FORMATTED</code> output. Some + of its properties are <code class="ph codeph">NULL</code> or blank because they are inherited from the base table. The + text of the query that defines the view is part of the <code class="ph codeph">DESCRIBE FORMATTED</code> output. + </li> + + <li class="li"> + A table with additional clauses in the <code class="ph codeph">CREATE TABLE</code> statement has differences in + <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for <code class="ph codeph">T2</code> includes the + <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and + different <code class="ph codeph">InputFormat</code> and <code class="ph codeph">OutputFormat</code> fields to reflect the Parquet file + format. + </li> + </ul> + +<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, y int, s string); +Query: create table t1 (x int, y int, s string) +[localhost:21000] > describe t1; +Query: describe t1 +Query finished, fetching results ... ++------+--------+---------+ +| name | type | comment | ++------+--------+---------+ +| x | int | | +| y | int | | +| s | string | | ++------+--------+---------+ +Returned 3 row(s) in 0.13s +[localhost:21000] > describe formatted t1; +Query: describe formatted t1 +Query finished, fetching results ... ++------------------------------+--------------------------------------------+------------+ +| name | type | comment | ++------------------------------+--------------------------------------------+------------+ +| # col_name | data_type | comment | +| | NULL | NULL | +| x | int | None | +| y | int | None | +| s | string | None | +| | NULL | NULL | +| # Detailed Table Information | NULL | NULL | +| Database: | describe_formatted | NULL | +| Owner: | doc_demo | NULL | +| CreateTime: | Mon Jul 22 17:03:16 EDT 2013 | NULL | +| LastAccessTime: | UNKNOWN | NULL | +| Protect Mode: | None | NULL | +| Retention: | 0 | NULL | +| Location: | hdfs://127.0.0.1:8020/user/hive/warehouse/ | | +| | describe_formatted.db/t1 | NULL | +| Table Type: | MANAGED_TABLE | NULL | +| Table Parameters: | NULL | NULL | +| | transient_lastDdlTime | 1374526996 | +| | NULL | NULL | +| # Storage Information | NULL | NULL | +| SerDe Library: | org.apache.hadoop.hive.serde2.lazy. | | +| | LazySimpleSerDe | NULL | +| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | +| OutputFormat: | org.apache.hadoop.hive.ql.io. | | +| | HiveIgnoreKeyTextOutputFormat | NULL | +| Compressed: | No | NULL | +| Num Buckets: | 0 | NULL | +| Bucket Columns: | [] | NULL | +| Sort Columns: | [] | NULL | ++------------------------------+--------------------------------------------+------------+ +Returned 26 row(s) in 0.03s +[localhost:21000] > create view v1 as select x, upper(s) from t1; +Query: create view v1 as select x, upper(s) from t1 +[localhost:21000] > describe v1; +Query: describe v1 +Query finished, fetching results ... ++------+--------+---------+ +| name | type | comment | ++------+--------+---------+ +| x | int | | +| _c1 | string | | ++------+--------+---------+ +Returned 2 row(s) in 0.10s +[localhost:21000] > describe formatted v1; +Query: describe formatted v1 +Query finished, fetching results ... ++------------------------------+------------------------------+----------------------+ +| name | type | comment | ++------------------------------+------------------------------+----------------------+ +| # col_name | data_type | comment | +| | NULL | NULL | +| x | int | None | +| _c1 | string | None | +| | NULL | NULL | +| # Detailed Table Information | NULL | NULL | +| Database: | describe_formatted | NULL | +| Owner: | doc_demo | NULL | +| CreateTime: | Mon Jul 22 16:56:38 EDT 2013 | NULL | +| LastAccessTime: | UNKNOWN | NULL | +| Protect Mode: | None | NULL | +| Retention: | 0 | NULL | +| Table Type: | VIRTUAL_VIEW | NULL | +| Table Parameters: | NULL | NULL | +| | transient_lastDdlTime | 1374526598 | +| | NULL | NULL | +| # Storage Information | NULL | NULL | +| SerDe Library: | null | NULL | +| InputFormat: | null | NULL | +| OutputFormat: | null | NULL | +| Compressed: | No | NULL | +| Num Buckets: | 0 | NULL | +| Bucket Columns: | [] | NULL | +| Sort Columns: | [] | NULL | +| | NULL | NULL | +| # View Information | NULL | NULL | +| View Original Text: | SELECT x, upper(s) FROM t1 | NULL | +| View Expanded Text: | SELECT x, upper(s) FROM t1 | NULL | ++------------------------------+------------------------------+----------------------+ +Returned 28 row(s) in 0.03s +[localhost:21000] > create external table t2 (x int, y int, s string) stored as parquet location '/user/doc_demo/sample_data'; +[localhost:21000] > describe formatted t2; +Query: describe formatted t2 +Query finished, fetching results ... ++------------------------------+----------------------------------------------------+------------+ +| name | type | comment | ++------------------------------+----------------------------------------------------+------------+ +| # col_name | data_type | comment | +| | NULL | NULL | +| x | int | None | +| y | int | None | +| s | string | None | +| | NULL | NULL | +| # Detailed Table Information | NULL | NULL | +| Database: | describe_formatted | NULL | +| Owner: | doc_demo | NULL | +| CreateTime: | Mon Jul 22 17:01:47 EDT 2013 | NULL | +| LastAccessTime: | UNKNOWN | NULL | +| Protect Mode: | None | NULL | +| Retention: | 0 | NULL | +| Location: | hdfs://127.0.0.1:8020/user/doc_demo/sample_data | NULL | +| Table Type: | EXTERNAL_TABLE | NULL | +| Table Parameters: | NULL | NULL | +| | EXTERNAL | TRUE | +| | transient_lastDdlTime | 1374526907 | +| | NULL | NULL | +| # Storage Information | NULL | NULL | +| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | +| InputFormat: | org.apache.impala.hive.serde.ParquetInputFormat | NULL | +| OutputFormat: | org.apache.impala.hive.serde.ParquetOutputFormat | NULL | +| Compressed: | No | NULL | +| Num Buckets: | 0 | NULL | +| Bucket Columns: | [] | NULL | +| Sort Columns: | [] | NULL | ++------------------------------+----------------------------------------------------+------------+ +Returned 27 row(s) in 0.17s</code></pre> + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Cannot be cancelled. + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> + </p> + <p class="p"> + The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, + typically the <code class="ph codeph">impala</code> user, must have read and execute + permissions for all directories that are part of the table. + (A table could span multiple different HDFS directories if it is partitioned. + The directories could be widely scattered because a partition can reside + in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.) + </p> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + + <p class="p"> + The information displayed for Kudu tables includes the additional attributes + that are only applicable for Kudu tables: + </p> + <ul class="ul"> + <li class="li"> + Whether or not the column is part of the primary key. Every Kudu table + has a <code class="ph codeph">true</code> value here for at least one column. There + could be multiple <code class="ph codeph">true</code> values, for tables with + composite primary keys. + </li> + <li class="li"> + Whether or not the column is nullable. Specified by the <code class="ph codeph">NULL</code> + or <code class="ph codeph">NOT NULL</code> attributes on the <code class="ph codeph">CREATE TABLE</code> statement. + Columns that are part of the primary key are automatically non-nullable. + </li> + <li class="li"> + The default value, if any, for the column. Specified by the <code class="ph codeph">DEFAULT</code> + attribute on the <code class="ph codeph">CREATE TABLE</code> statement. If the default value is + <code class="ph codeph">NULL</code>, that is not indicated in this column. It is implied by + <code class="ph codeph">nullable</code> being true and no other default value specified. + </li> + <li class="li"> + The encoding used for values in the column. Specified by the <code class="ph codeph">ENCODING</code> + attribute on the <code class="ph codeph">CREATE TABLE</code> statement. + </li> + <li class="li"> + The compression used for values in the column. Specified by the <code class="ph codeph">COMPRESSION</code> + attribute on the <code class="ph codeph">CREATE TABLE</code> statement. + </li> + <li class="li"> + The block size (in bytes) used for the underlying Kudu storage layer for the column. + Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute on the <code class="ph codeph">CREATE TABLE</code> + statement. + </li> + </ul> + + <p class="p"> + The following example shows <code class="ph codeph">DESCRIBE</code> output for a simple Kudu table, with + a single-column primary key and all column attributes left with their default values: + </p> + +<pre class="pre codeblock"><code> +describe million_rows; ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size | ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +| id | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| s | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +</code></pre> + + <p class="p"> + The following example shows <code class="ph codeph">DESCRIBE</code> output for a Kudu table with a + two-column primary key, and Kudu-specific attributes applied to some columns: + </p> + +<pre class="pre codeblock"><code> +create table kudu_describe_example +( + c1 int, c2 int, + c3 string, c4 string not null, c5 string default 'n/a', c6 string default '', + c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle, + primary key(c1,c2) +) +partition by hash (c1, c2) partitions 10 stored as kudu; + +describe kudu_describe_example; ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size | ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +| c1 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c2 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c3 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c4 | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c5 | string | | false | true | n/a | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c6 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c7 | bigint | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c8 | bigint | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | +| c9 | bigint | | false | true | -1 | BIT_SHUFFLE | DEFAULT_COMPRESSION | 0 | ++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+ +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, + <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_development.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_development.html b/docs/build/html/topics/impala_development.html new file mode 100644 index 0000000..f8e0ae5 --- /dev/null +++ b/docs/build/html/topics/impala_development.html @@ -0,0 +1,197 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_dev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Developing Impala Applications</title></head><body id="intro_dev"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Developing Impala Applications</h1> + + + + <div class="body conbody"> + + <p class="p"> + The core development language with Impala is SQL. You can also use Java or other languages to interact with + Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For + specialized kinds of analysis, you can supplement the SQL built-in functions by writing + <a class="xref" href="impala_udf.html#udfs">user-defined functions (UDFs)</a> in C++ or Java. + </p> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_dev__intro_sql"> + + <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala SQL Dialect</h2> + + + <div class="body conbody"> + + <p class="p"> + The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As + such, it is familiar to users who are already familiar with running SQL queries on the Hadoop + infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in + functions. Impala also includes additional built-in functions for common industry features, to simplify + porting SQL from non-Hadoop systems. + </p> + + <p class="p"> + For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect + might seem familiar: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + The <a class="xref" href="impala_select.html#select">SELECT statement</a> includes familiar clauses such as <code class="ph codeph">WHERE</code>, + <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>. + You will find familiar notions such as + <a class="xref" href="impala_joins.html#joins">joins</a>, <a class="xref" href="impala_functions.html#builtins">built-in + functions</a> for processing strings, numbers, and dates, + <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>, + <a class="xref" href="impala_subqueries.html#subqueries">subqueries</a>, and + <a class="xref" href="impala_operators.html#comparison_operators">comparison operators</a> + such as <code class="ph codeph">IN()</code> and <code class="ph codeph">BETWEEN</code>. + The <code class="ph codeph">SELECT</code> statement is the place where SQL standards compliance is most important. + </p> + </li> + + <li class="li"> + <p class="p"> + From the data warehousing world, you will recognize the notion of + <a class="xref" href="impala_partitioning.html#partitioning">partitioned tables</a>. + One or more columns serve as partition keys, and the data is physically arranged so that + queries that refer to the partition key columns in the <code class="ph codeph">WHERE</code> clause + can skip partitions that do not match the filter conditions. For example, if you have 10 + years worth of data and use a clause such as <code class="ph codeph">WHERE year = 2015</code>, + <code class="ph codeph">WHERE year > 2010</code>, or <code class="ph codeph">WHERE year IN (2014, 2015)</code>, + Impala skips all the data for non-matching years, greatly reducing the amount of I/O + for the query. + </p> + </li> + + <li class="li"> + <p class="p"> + In Impala 1.2 and higher, <a class="xref" href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons + and transformation logic during <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> statements. + </p> + </li> + </ul> + + <p class="p"> + For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect + might require some learning and practice for you to become proficient in the Hadoop environment: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Impala SQL is focused on queries and includes relatively little DML. There is no <code class="ph codeph">UPDATE</code> + or <code class="ph codeph">DELETE</code> statement. Stale data is typically discarded (by <code class="ph codeph">DROP TABLE</code> + or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> statements) or replaced (by <code class="ph codeph">INSERT + OVERWRITE</code> statements). + </p> + </li> + + <li class="li"> + <p class="p"> + All data creation is done by <code class="ph codeph">INSERT</code> statements, which typically insert data in bulk by + querying from other tables. There are two variations, <code class="ph codeph">INSERT INTO</code> which appends to the + existing data, and <code class="ph codeph">INSERT OVERWRITE</code> which replaces the entire contents of a table or + partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> followed by a new <code class="ph codeph">INSERT</code>). + Although there is an <code class="ph codeph">INSERT ... VALUES</code> syntax to create a small number of values in + a single statement, it is far more efficient to use the <code class="ph codeph">INSERT ... SELECT</code> to copy + and transform large amounts of data from one table to another in a single operation. + </p> + </li> + + <li class="li"> + <p class="p"> + You often construct Impala table definitions and data files in some other environment, and then attach + Impala so that it can run real-time queries. The same data files and table metadata are shared with other + components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data + inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components + can write files in formats such as Parquet and Avro, that can then be queried by Impala. + </p> + </li> + + <li class="li"> + <p class="p"> + Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL + includes some idioms that you might find in the import utilities for traditional database systems. For + example, you can create a table that reads comma-separated or tab-separated text files, specifying the + separator in the <code class="ph codeph">CREATE TABLE</code> statement. You can create <strong class="ph b">external tables</strong> that read + existing data files but do not move or transform them. + </p> + </li> + + <li class="li"> + <p class="p"> + Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does + not require length constraints on string data types. For example, you can define a database column as + <code class="ph codeph">STRING</code> with unlimited length, rather than <code class="ph codeph">CHAR(1)</code> or + <code class="ph codeph">VARCHAR(64)</code>. <span class="ph">(Although in Impala 2.0 and later, you can also use + length-constrained <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> types.)</span> + </p> + </li> + + </ul> + + <p class="p"> + <strong class="ph b">Related information:</strong> <a class="xref" href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially + <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> + </p> + </div> + </article> + + + + + + + + + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_dev__intro_apis"> + + <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Programming Interfaces</h2> + + + <div class="body conbody"> + + <p class="p"> + You can connect and submit requests to the Impala daemons through: + </p> + + <ul class="ul"> + <li class="li"> + The <code class="ph codeph"><a class="xref" href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive + command interpreter. + </li> + + <li class="li"> + The <a class="xref" href="http://gethue.com/" target="_blank">Hue</a> web-based user interface. + </li> + + <li class="li"> + <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>. + </li> + + <li class="li"> + <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>. + </li> + </ul> + + <p class="p"> + With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications + running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence + tools that use the JDBC and ODBC interfaces. + </p> + + <p class="p"> + Each <code class="ph codeph">impalad</code> daemon process, running on separate nodes in a cluster, listens to + <a class="xref" href="impala_ports.html#ports">several ports</a> for incoming requests. Requests from + <code class="ph codeph">impala-shell</code> and Hue are routed to the <code class="ph codeph">impalad</code> daemons through the same + port. The <code class="ph codeph">impalad</code> daemons listen on separate ports for JDBC and ODBC requests. + </p> + </div> + </article> +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_codegen.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_disable_codegen.html b/docs/build/html/topics/impala_disable_codegen.html new file mode 100644 index 0000000..f8766b7 --- /dev/null +++ b/docs/build/html/topics/impala_disable_codegen.html @@ -0,0 +1,36 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_codegen"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body id="disable_codegen"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + This is a debug option, intended for diagnosing and working around issues that cause crashes. If a query + fails with an <span class="q">"illegal instruction"</span> or other hardware-specific message, try setting + <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the query again. If the query succeeds only when the + <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, submit the problem to <span class="keyword">the appropriate support channel</span> and include that + detail in the problem report. Do not otherwise run with this setting turned on, because it results in lower + overall performance. + </p> + + <p class="p"> + Because the code generation phase adds a small amount of overhead for each query, you might turn on the + <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum throughput when running many short-lived queries + against small tables. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_row_runtime_filtering.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_disable_row_runtime_filtering.html b/docs/build/html/topics/impala_disable_row_runtime_filtering.html new file mode 100644 index 0000000..11ccb80 --- /dev/null +++ b/docs/build/html/topics/impala_disable_row_runtime_filtering.html @@ -0,0 +1,72 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</title></head><body id="disable_row_runtime_filtering"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query option + reduces the scope of the runtime filtering feature. Queries still dynamically prune + partitions, but do not apply the filtering logic to individual rows within partitions. + </p> + + <p class="p"> + Only applies to queries against Parquet tables. For other file formats, Impala + only prunes at the level of partitions, not individual rows. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span> + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Impala automatically evaluates whether the per-row filters are being + effective at reducing the amount of intermediate data. Therefore, + this option is typically only needed for the rare case where Impala + cannot accurately determine how effective the per-row filtering is + for a query. + </p> + + <p class="p"> + Because the runtime filtering feature applies mainly to resource-intensive + and long-running queries, only adjust this query option when tuning long-running queries + involving some combination of large partitioned tables and joins involving large tables. + </p> + + <p class="p"> + Because this setting only improves query performance in very specific + circumstances, depending on the query characteristics and data distribution, + only use it when you determine through benchmarking that it improves + performance of specific expensive queries. + Consider setting this query option immediately before the expensive query and + unsetting it immediately afterward. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>, + <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> + + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_streaming_preaggregations.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_disable_streaming_preaggregations.html b/docs/build/html/topics/impala_disable_streaming_preaggregations.html new file mode 100644 index 0000000..98ea640 --- /dev/null +++ b/docs/build/html/topics/impala_disable_streaming_preaggregations.html @@ -0,0 +1,50 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_streaming_preaggregations"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</title></head><body id="disable_streaming_preaggregations"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Turns off the <span class="q">"streaming preaggregation"</span> optimization that is available in <span class="keyword">Impala 2.5</span> + and higher. This optimization reduces unnecessary work performed by queries that perform aggregation + operations on columns with few or no duplicate values, for example <code class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code> + or <code class="ph codeph">GROUP BY <var class="keyword varname">unique_column</var></code>. If the optimization causes regressions in + existing queries that use aggregation functions, you can turn it off as needed by setting this query option. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value + <code class="ph codeph">true</code> is not recognized. This limitation is + tracked by the issue + <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>, + which shows the releases where the problem is fixed. + </div> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Typically, queries that would require enabling this option involve very large numbers of + aggregated values, such as a billion or more distinct keys being processed on each + worker node. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_unsafe_spills.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_disable_unsafe_spills.html b/docs/build/html/topics/impala_disable_unsafe_spills.html new file mode 100644 index 0000000..01bc8fd --- /dev/null +++ b/docs/build/html/topics/impala_disable_unsafe_spills.html @@ -0,0 +1,50 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</title></head><body id="disable_unsafe_spills"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than + write temporary data to disk. + </p> + + <p class="p"> + Queries that <span class="q">"spill"</span> to disk typically complete successfully, when in earlier Impala releases they would have failed. + However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could + become so slow as a result that you would rather have them cancelled automatically and reduce the memory + usage through standard Impala tuning techniques. + </p> + + <p class="p"> + This option prevents only <span class="q">"unsafe"</span> spill operations, meaning that one or more tables are missing + statistics or the query does not include a hint to set the most efficient mechanism for a join or + <code class="ph codeph">INSERT ... SELECT</code> into a partitioned table. These are the tables most likely to result in + suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a + good way to find tables on which to run the <code class="ph codeph">COMPUTE STATS</code> statement. + </p> + + <p class="p"> + See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for information about the <span class="q">"spill to disk"</span> + feature for queries processing large result sets with joins, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP + BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation functions, or analytic functions. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disk_space.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_disk_space.html b/docs/build/html/topics/impala_disk_space.html new file mode 100644 index 0000000..0b102e5 --- /dev/null +++ b/docs/build/html/topics/impala_disk_space.html @@ -0,0 +1,133 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disk_space"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Disk Space for Impala Data</title></head><body id="disk_space"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for Impala Data</h1> + + + + <div class="body conbody"> + + <p class="p"> + Although Impala typically works with many large files in an HDFS storage system with plenty of capacity, + there are times when you might perform some file cleanup to reclaim space, or advise developers on techniques + to minimize space consumption and file duplication. + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Use compact binary file formats where practical. Numeric and time-based data in particular can be stored + in more compact form in binary data files. Depending on the file format, various compression and encoding + features can reduce file size even further. You can specify the <code class="ph codeph">STORED AS</code> clause as part + of the <code class="ph codeph">CREATE TABLE</code> statement, or <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET + FILEFORMAT</code> clause for an existing table or partition within a partitioned table. See + <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about file formats, especially + <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and + <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details. + </p> + </li> + + <li class="li"> + <p class="p"> + You manage underlying data files differently depending on whether the corresponding Impala table is + defined as an <a class="xref" href="impala_tables.html#internal_tables">internal</a> or + <a class="xref" href="impala_tables.html#external_tables">external</a> table: + </p> + <ul class="ul"> + <li class="li"> + Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to check if a particular table is internal + (managed by Impala) or external, and to see the physical location of the data files in HDFS. See + <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details. + </li> + + <li class="li"> + For Impala-managed (<span class="q">"internal"</span>) tables, use <code class="ph codeph">DROP TABLE</code> statements to remove + data files. See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. + </li> + + <li class="li"> + For tables not managed by Impala (<span class="q">"external"</span> tables), use appropriate HDFS-related commands such + as <code class="ph codeph">hadoop fs</code>, <code class="ph codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, move, + copy, or delete files within HDFS directories that are accessible by the <code class="ph codeph">impala</code> user. + Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement after adding or removing any + files from the data directory of an external table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for + details. + </li> + + <li class="li"> + Use external tables to reference HDFS data files in their original location. With this technique, you + avoid copying the files, and you can map more than one Impala table to the same set of data files. When + you drop the Impala table, the data files are left undisturbed. See + <a class="xref" href="impala_tables.html#external_tables">External Tables</a> for details. + </li> + + <li class="li"> + Use the <code class="ph codeph">LOAD DATA</code> statement to move HDFS files into the data directory for an Impala + table from inside Impala, without the need to specify the HDFS path of the destination directory. This + technique works for both internal and external tables. See + <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details. + </li> + </ul> + </li> + + <li class="li"> + <p class="p"> + Make sure that the HDFS trashcan is configured correctly. When you remove files from HDFS, the space + might not be reclaimed for use by other files until sometime later, when the trashcan is emptied. See + <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. See + <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for permissions needed for the HDFS trashcan to operate + correctly. + </p> + </li> + + <li class="li"> + <p class="p"> + Drop all tables in a database before dropping the database itself. See + <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details. + </p> + </li> + + <li class="li"> + <p class="p"> + Clean up temporary files after failed <code class="ph codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code> + statement encounters an error, and you see a directory named <span class="ph filepath">.impala_insert_staging</span> + or <span class="ph filepath">_impala_insert_staging</span> left behind in the data directory for the table, it might + contain temporary data files taking up space in HDFS. You might be able to salvage these data files, for + example if they are complete but could not be moved into place due to a permission error. Or, you might + delete those files through commands such as <code class="ph codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to + reclaim space before re-trying the <code class="ph codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED + <var class="keyword varname">table_name</var></code> to see the HDFS path where you can check for temporary files. + </p> + </li> + + <li class="li"> + <p class="p"> + By default, intermediate files used during large sort, join, aggregation, or analytic function operations + are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the + operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span> + technique, without any name conflicts for these temporary files.) You can specify a different location by + starting the <span class="keyword cmdname">impalad</span> daemon with the + <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option. + You can specify a single directory, or a comma-separated list of directories. The scratch directories must + be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts, + depending on the capacity and speed + of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning + Impala successfully starts (with a warning written to the log) if it cannot create or read and write files + in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides, + Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing + files in a scratch directory during a query, Impala logs the error and the query fails. + </p> + </li> + + <li class="li"> + <p class="p"> + If you use the Amazon Simple Storage Service (S3) as a place to offload + data to reduce the volume of local storage, Impala 2.2.0 and higher + can query the data directly from S3. + See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details. + </p> + </li> + </ul> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav></article></main></body></html> \ No newline at end of file
