http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_tables.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_tables.html b/docs/build/html/topics/impala_tables.html new file mode 100644 index 0000000..f636f20 --- /dev/null +++ b/docs/build/html/topics/impala_tables.html @@ -0,0 +1,446 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tables"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Tables</title></head><body id="tables"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Tables</h1> + + + + <div class="body conbody"> + + <p class="p"></p> + + <p class="p"> + Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to + other database systems, plus some features such as partitioning often associated with higher-end data + warehouse systems. + </p> + + <p class="p"> + Logically, each table has a structure based on the definition of its columns, partitions, and other + properties. + </p> + + <p class="p"> + Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files + underneath that directory: + </p> + + <ul class="ul"> + <li class="li"> + <a class="xref" href="impala_tables.html#internal_tables">Internal tables</a> are managed by Impala, and use directories + inside the designated Impala work area. + </li> + + <li class="li"> + <a class="xref" href="impala_tables.html#external_tables">External tables</a> use arbitrary HDFS directories, where + the data files are typically shared between different Hadoop components. + </li> + + <li class="li"> + Large-scale data is usually handled by partitioned tables, where the data files are divided among different + HDFS subdirectories. + </li> + </ul> + + <p class="p"> + Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<span class="keyword">Impala 2.2</span> or higher), + or on Isilon storage devices (<span class="keyword">Impala 2.2.3</span> or higher). See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>, + <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>, and <a class="xref" href="impala_isilon.html#impala_isilon">Using Impala with Isilon Storage</a> + for details about those special kinds of tables. + </p> + + <p class="p"> + Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any + files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the + Impala table. The suffix matching is case-insensitive, so for example Impala ignores both + <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes. + </p> + + <p class="p toc inpage"></p> + + <p class="p"> + <strong class="ph b">Related statements:</strong> <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, + <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> + <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>, <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a>, + <a class="xref" href="impala_select.html#select">SELECT Statement</a> + </p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="tables__internal_tables"> + + <h2 class="title topictitle2" id="ariaid-title2">Internal Tables</h2> + + <div class="body conbody"> + + <p class="p"> + + The default kind of table produced by the <code class="ph codeph">CREATE TABLE</code> statement is known as an internal + table. (Its counterpart is the external table, produced by the <code class="ph codeph">CREATE EXTERNAL TABLE</code> + syntax.) + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Impala creates a directory in HDFS to hold the data files. + </p> + </li> + + <li class="li"> + <p class="p"> + You can create data in internal tables by issuing <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> + statements. + </p> + </li> + + <li class="li"> + <p class="p"> + If you add or replace data using HDFS operations, issue the <code class="ph codeph">REFRESH</code> command in + <span class="keyword cmdname">impala-shell</span> so that Impala recognizes the changes in data files, block locations, + and so on. + </p> + </li> + + <li class="li"> + <p class="p"> + When you issue a <code class="ph codeph">DROP TABLE</code> statement, Impala physically removes all the data files + from the directory. + </p> + </li> + + <li class="li"> + <p class="p"> + To see whether a table is internal or external, and its associated HDFS location, issue the statement + <code class="ph codeph">DESCRIBE FORMATTED <var class="keyword varname">table_name</var></code>. The <code class="ph codeph">Table Type</code> field + displays <code class="ph codeph">MANAGED_TABLE</code> for internal tables and <code class="ph codeph">EXTERNAL_TABLE</code> for + external tables. The <code class="ph codeph">Location</code> field displays the path of the table directory as an HDFS + URI. + </p> + </li> + + <li class="li"> + <p class="p"> + When you issue an <code class="ph codeph">ALTER TABLE</code> statement to rename an internal table, all data files + are moved into the new HDFS directory for the table. The files are moved even if they were formerly in + a directory outside the Impala data directory, for example in an internal table with a + <code class="ph codeph">LOCATION</code> attribute pointing to an outside HDFS directory. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <div class="p"> + You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER + TABLE</code> statement: +<pre class="pre codeblock"><code> +-- Switch a table from internal to external. +ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE'); + +-- Switch a table from external to internal. +ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE'); +</code></pre> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_tables.html#external_tables">External Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, + <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, + <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="tables__external_tables"> + + <h2 class="title topictitle2" id="ariaid-title3">External Tables</h2> + + <div class="body conbody"> + + <p class="p"> + + The syntax <code class="ph codeph">CREATE EXTERNAL TABLE</code> sets up an Impala table that points at existing data + files, potentially in HDFS locations outside the normal Impala data directories.. This operation saves the + expense of importing the data into a new table when you already have the data files in a known location in + HDFS, in the desired file format. + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + You can use Impala to query the data in this table. + </p> + </li> + + <li class="li"> + <p class="p"> + You can create data in external tables by issuing <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> + statements. + </p> + </li> + + <li class="li"> + <p class="p"> + If you add or replace data using HDFS operations, issue the <code class="ph codeph">REFRESH</code> command in + <span class="keyword cmdname">impala-shell</span> so that Impala recognizes the changes in data files, block locations, + and so on. + </p> + </li> + + <li class="li"> + <p class="p"> + When you issue a <code class="ph codeph">DROP TABLE</code> statement in Impala, that removes the connection that + Impala has with the associated data files, but does not physically remove the underlying data. You can + continue to use the data files with other Hadoop components and HDFS operations. + </p> + </li> + + <li class="li"> + <p class="p"> + To see whether a table is internal or external, and its associated HDFS location, issue the statement + <code class="ph codeph">DESCRIBE FORMATTED <var class="keyword varname">table_name</var></code>. The <code class="ph codeph">Table Type</code> field + displays <code class="ph codeph">MANAGED_TABLE</code> for internal tables and <code class="ph codeph">EXTERNAL_TABLE</code> for + external tables. The <code class="ph codeph">Location</code> field displays the path of the table directory as an HDFS + URI. + </p> + </li> + + <li class="li"> + <p class="p"> + When you issue an <code class="ph codeph">ALTER TABLE</code> statement to rename an external table, all data files + are left in their original locations. + </p> + </li> + + <li class="li"> + <p class="p"> + You can point multiple external tables at the same HDFS directory by using the same + <code class="ph codeph">LOCATION</code> attribute for each one. The tables could have different column definitions, + as long as the number and types of columns are compatible with the schema evolution considerations for + the underlying file type. For example, for text data files, one table might define a certain column as + a <code class="ph codeph">STRING</code> while another defines the same column as a <code class="ph codeph">BIGINT</code>. + </p> + </li> + </ul> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <div class="p"> + You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER + TABLE</code> statement: +<pre class="pre codeblock"><code> +-- Switch a table from internal to external. +ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE'); + +-- Switch a table from external to internal. +ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE'); +</code></pre> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, + <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, + <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="tables__table_file_formats"> + <h2 class="title topictitle2" id="ariaid-title4">File Formats</h2> + + <div class="body conbody"> + <p class="p"> + Each table has an associated file format, which determines how Impala interprets the + associated data files. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. + </p> + <p class="p"> + You set the file format during the <code class="ph codeph">CREATE TABLE</code> statement, + or change it later using the <code class="ph codeph">ALTER TABLE</code> statement. + Partitioned tables can have a different file format for individual partitions, + allowing you to change the file format used in your ETL process for new data + without going back and reconverting all the existing data in the same table. + </p> + <p class="p"> + Any <code class="ph codeph">INSERT</code> statements produce new data files with the current file format of the table. + For existing data files, changing the file format of the table does not automatically do any data conversion. + You must use <code class="ph codeph">TRUNCATE TABLE</code> or <code class="ph codeph">INSERT OVERWRITE</code> to remove any previous data + files that use the old file format. + Then you use the <code class="ph codeph">LOAD DATA</code> statement, <code class="ph codeph">INSERT ... SELECT</code>, or other mechanism + to put data files of the correct format into the table. + </p> + <p class="p"> + The default file format, text, is the most flexible and easy to produce when you are just getting started with + Impala. The Parquet file format offers the highest query performance and uses compression to reduce storage + requirements; therefore, where practical, use Parquet for Impala tables with substantial amounts of data. + <span class="ph">Also, the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>) + available in <span class="keyword">Impala 2.3</span> and higher are currently only supported with the Parquet file type.</span> + Based on your existing ETL workflow, you might use other file formats such as Avro, possibly doing a final + conversion step to Parquet to take advantage of its performance for analytic queries. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="tables__kudu_tables"> + <h2 class="title topictitle2" id="ariaid-title5">Kudu Tables</h2> + + + <div class="body conbody"> + <p class="p"> + Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files. + Some information about the table is stored in the metastore database for use by Impala. Other table metadata is + managed internally by Kudu. + </p> + + <p class="p"> + When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form + <code class="ph codeph">impala::<var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>. You can see the Kudu-assigned name + in the output of <code class="ph codeph">DESCRIBE FORMATTED</code>, in the <code class="ph codeph">kudu.table_name</code> field of the table properties. + The Kudu-assigned name remains the same even if you use <code class="ph codeph">ALTER TABLE</code> to rename the Impala table + or move it to a different Impala database. If you issue the statement + <code class="ph codeph">ALTER TABLE <var class="keyword varname">impala_name</var> SET TBLPROPERTIES('kudu.table_name' = '<var class="keyword varname">different_kudu_table_name</var>')</code>, + the effect is different depending on whether the Impala table was created with a regular <code class="ph codeph">CREATE TABLE</code> + statement (that is, if it is an internal or managed table), or if it was created with a + <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement (and therefore is an external table). Changing the <code class="ph codeph">kudu.table_name</code> + property of an internal table physically renames the underlying Kudu table to match the new name. + Changing the <code class="ph codeph">kudu.table_name</code> property of an external table switches which underlying Kudu table + the Impala table refers to; the underlying Kudu table must already exist. + </p> + + <p class="p"> + The following example shows what happens with both internal and external Kudu tables as the <code class="ph codeph">kudu.table_name</code> + property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created + outside of Impala, that is, through the Kudu API. + </p> + +<pre class="pre codeblock"><code> +-- This is an internal table that we will create and then rename. +create table old_name (id bigint primary key, s string) + partition by hash(id) partitions 2 stored as kudu; + +-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides. +describe formatted old_name; +... +| Location: | hdfs://host.example.com:8020/path/user.db/old_name +| Table Type: | MANAGED_TABLE | NULL +| Table Parameters: | NULL | NULL +| | DO_NOT_UPDATE_STATS | true +| | kudu.master_addresses | vd0342.example.com +| | kudu.table_name | impala::user.old_name + +-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name. +alter table old_name rename to new_name; + +describe formatted new_name; +| Location: | hdfs://host.example.com:8020/path/user.db/new_name +| Table Type: | MANAGED_TABLE | NULL +| Table Parameters: | NULL | NULL +| | DO_NOT_UPDATE_STATS | true +| | kudu.master_addresses | vd0342.example.com +| | kudu.table_name | impala::user.old_name + +-- Setting TBLPROPERTIES changes the underlying Kudu name. +alter table new_name + set tblproperties('kudu.table_name' = 'impala::user.new_name'); + +describe formatted new_name; +| Location: | hdfs://host.example.com:8020/path/user.db/new_name +| Table Type: | MANAGED_TABLE | NULL +| Table Parameters: | NULL | NULL +| | DO_NOT_UPDATE_STATS | true +| | kudu.master_addresses | vd0342.example.com +| | kudu.table_name | impala::user.new_name + +-- Put some data in the table to demonstrate how external tables can map to +-- different underlying Kudu tables. +insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two'); + +-- This external table points to the same underlying Kudu table, NEW_NAME, +-- as we created above. No need to declare columns or other table aspects. +create external table kudu_table_alias stored as kudu + tblproperties('kudu.table_name' = 'impala::user.new_name'); + +-- The external table can fetch data from the NEW_NAME table that already +-- existed and already had data. +select * from kudu_table_alias limit 100; ++----+------+ +| id | s | ++----+------+ +| 1 | one | +| 0 | zero | +| 2 | two | ++----+------+ + +-- We cannot re-point the external table at a different underlying Kudu table +-- unless that other underlying Kudu table already exists. +alter table kudu_table_alias + set tblproperties('kudu.table_name' = 'impala::user.yet_another_name'); +ERROR: +TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name', + Kudu error: The table does not exist: table_name: "impala::user.yet_another_name" + +-- Once the underlying Kudu table exists, we can re-point the external table to it. +create table yet_another_name (id bigint primary key, x int, y int, s string) + partition by hash(id) partitions 2 stored as kudu; + +alter table kudu_table_alias + set tblproperties('kudu.table_name' = 'impala::user.yet_another_name'); + +-- Now no data is returned because this other table is empty. +select * from kudu_table_alias limit 100; + +-- The Impala table automatically recognizes the table schema of the new table, +-- for example the extra X and Y columns not present in the original table. +describe kudu_table_alias; ++------+--------+---------+-------------+----------+... +| name | type | comment | primary_key | nullable |... ++------+--------+---------+-------------+----------+... +| id | bigint | | true | false |... +| x | int | | false | true |... +| y | int | | false | true |... +| s | string | | false | true |... ++------+--------+---------+-------------+----------+... +</code></pre> + + <p class="p"> + The <code class="ph codeph">SHOW TABLE STATS</code> output for a Kudu table shows Kudu-specific details about the layout of the table. + Instead of information about the number and sizes of files, the information is divided by the Kudu tablets. + For each tablet, the output includes the fields + <code class="ph codeph"># Rows</code> (although this number is not currently computed), <code class="ph codeph">Start Key</code>, <code class="ph codeph">Stop Key</code>, <code class="ph codeph">Leader Replica</code>, and <code class="ph codeph"># Replicas</code>. + The output of <code class="ph codeph">SHOW COLUMN STATS</code>, illustrating the distribution of values within each column, is the same for Kudu tables + as for HDFS-backed tables. + </p> + + <div class="p"> + The distinction between internal and external tables has some special + details for Kudu tables. Tables created entirely through Impala are + internal tables. The table name as represented within Kudu includes + notation such as an <code class="ph codeph">impala::</code> prefix and the Impala + database name. External Kudu tables are those created by a non-Impala + mechanism, such as a user application calling the Kudu APIs. For + these tables, the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax lets + you establish a mapping from Impala to the existing Kudu table: +<pre class="pre codeblock"><code> +CREATE EXTERNAL TABLE impala_name STORED AS KUDU + TBLPROPERTIES('kudu.table_name' = 'original_kudu_name'); +</code></pre> + External Kudu tables differ in one important way from other external + tables: adding or dropping a column or range partition changes the + data in the underlying Kudu table, in contrast to an HDFS-backed + external table where existing data files are left untouched. + </div> + </div> + </article> + +</article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_timeouts.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_timeouts.html b/docs/build/html/topics/impala_timeouts.html new file mode 100644 index 0000000..2005c7d --- /dev/null +++ b/docs/build/html/topics/impala_timeouts.html @@ -0,0 +1,168 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="timeouts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Setting Timeout Periods for Daemons, Queries, and Sessions</title></head><body id="timeouts"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Setting Timeout Periods for Daemons, Queries, and Sessions</h1> + + + + + + <div class="body conbody"> + + <p class="p"> + Depending on how busy your <span class="keyword"></span> cluster is, you might increase or decrease various timeout + values. Increase timeouts if Impala is cancelling operations prematurely, when the system + is responding slower than usual but the operations are still successful if given extra + time. Decrease timeouts if operations are idle or hanging for long periods, and the idle + or hung operations are consuming resources and reducing concurrency. + </p> + + <p class="p toc inpage"></p> + + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="timeouts__statestore_timeout"> + + <h2 class="title topictitle2" id="ariaid-title2">Increasing the Statestore Timeout</h2> + + <div class="body conbody"> + + <p class="p"> + If you have an extensive Impala schema, for example with hundreds of databases, tens of + thousands of tables, and so on, you might encounter timeout errors during startup as the + Impala catalog service broadcasts metadata to all the Impala nodes using the statestore + service. To avoid such timeout errors on startup, increase the statestore timeout value + from its default of 10 seconds. Specify the timeout value using the + <code class="ph codeph">-statestore_subscriber_timeout_seconds</code> option for the statestore + service, using the configuration instructions in + <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>. The symptom of this problem is + messages in the <code class="ph codeph">impalad</code> log such as: + </p> + +<pre class="pre codeblock"><code>Connection with state-store lost +Trying to re-register with state-store</code></pre> + + <p class="p"> + See <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a> for more details about + statestore operation and settings on clusters with a large number of Impala-related + objects such as tables and partitions. + </p> + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="timeouts__impalad_timeout"> + + <h2 class="title topictitle2" id="ariaid-title3">Setting the Idle Query and Idle Session Timeouts for impalad</h2> + + <div class="body conbody"> + + <p class="p"> + To keep long-running queries or idle sessions from tying up cluster resources, you can + set timeout intervals for both individual queries, and entire sessions. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The timeout clock for queries and sessions only starts ticking when the query or session is idle. + For queries, this means the query has results ready but is waiting for a client to fetch the data. A + query can run for an arbitrary time without triggering a timeout, because the query is computing results + rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent + unclosed queries from consuming resources and taking up slots in the admission count of running queries, + potentially preventing other queries from starting. + </p> + <p class="p"> + For sessions, this means that no query has been submitted for some period of time. + </p> + </div> + + <p class="p"> + Specify the following startup options for the <span class="keyword cmdname">impalad</span> daemon: + </p> + + <ul class="ul"> + <li class="li"> + The <code class="ph codeph">--idle_query_timeout</code> option specifies the time in seconds after + which an idle query is cancelled. This could be a query whose results were all fetched + but was never closed, or one whose results were partially fetched and then the client + program stopped requesting further results. This condition is most likely to occur in + a client program using the JDBC or ODBC interfaces, rather than in the interactive + <span class="keyword cmdname">impala-shell</span> interpreter. Once the query is cancelled, the client + program cannot retrieve any further results. + </li> + + <li class="li"> + The <code class="ph codeph">--idle_session_timeout</code> option specifies the time in seconds after + which an idle session is expired. A session is idle when no activity is occurring for + any of the queries in that session, and the session has not started any new queries. + Once a session is expired, you cannot issue any new query requests to it. The session + remains open, but the only operation you can perform is to close it. The default value + of 0 means that sessions never expire. + </li> + </ul> + + <p class="p"> + For instructions on changing <span class="keyword cmdname">impalad</span> startup options, see + <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>. + </p> + + <p class="p"> + You can reduce the idle query timeout by using the <code class="ph codeph">QUERY_TIMEOUT_S</code> + query option. Any value specified for the <code class="ph codeph">--idle_query_timeout</code> startup + option serves as an upper limit for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. + See <a class="xref" href="impala_query_timeout_s.html#query_timeout_s">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a> for details. + </p> + + </div> + + </article> + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="timeouts__concept_rfy_jl1_rx"> + <h2 class="title topictitle2" id="ariaid-title4">Setting Timeout and Retries for Thrift Connections to the Backend + Client</h2> + <div class="body conbody"> + <p class="p">Impala connections to the backend client are subject to failure in + cases when the network is momentarily overloaded. To avoid failed + queries due to transient network problems, you can configure the number + of Thrift connection retries using the following option: </p> + <ul class="ul" id="concept_rfy_jl1_rx__ul_bj3_ql1_rx"> + <li class="li">The <code class="ph codeph">--backend_client_connection_num_retries</code> option + specifies the number of times Impala will try connecting to the + backend client after the first connection attempt fails. By default, + <span class="keyword cmdname">impalad</span> will attempt three re-connections before + it returns a failure. </li> + </ul> + <p class="p">You can configure timeouts for sending and receiving data from the + backend client. Therefore, if for some reason a query hangs, instead of + waiting indefinitely for a response, Impala will terminate the + connection after a configurable timeout.</p> + <ul class="ul" id="concept_rfy_jl1_rx__ul_vm2_2v1_rx"> + <li class="li">The <code class="ph codeph">--backend_client_rpc_timeout_ms</code> option can be + used to specify the number of milliseconds Impala should wait for a + response from the backend client before it terminates the connection + and signals a failure. The default value for this property is 300000 + milliseconds, or 5 minutes. </li> + </ul> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="timeouts__cancel_query"> + + <h2 class="title topictitle2" id="ariaid-title5">Cancelling a Query</h2> + + <div class="body conbody"> + + <p class="p"> + Sometimes, an Impala query might run for an unexpectedly long time, tying up resources + in the cluster. You can cancel the query explicitly, independent of the timeout period, + by going into the web UI for the <span class="keyword cmdname">impalad</span> host (on port 25000 by + default), and using the link on the <code class="ph codeph">/queries</code> tab to cancel the running + query. For example, press <code class="ph codeph">^C</code> in <span class="keyword cmdname">impala-shell</span>. + </p> + + </div> + + </article> + +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_timestamp.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_timestamp.html b/docs/build/html/topics/impala_timestamp.html new file mode 100644 index 0000000..02f86fc --- /dev/null +++ b/docs/build/html/topics/impala_timestamp.html @@ -0,0 +1,514 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="timestamp"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TIMESTAMP Data Type</title></head><body id="timestamp"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">TIMESTAMP Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a + point in time. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> TIMESTAMP</code></pre> + + <p class="p"> + <strong class="ph b">Range:</strong> Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive + <code class="ph codeph">TIMESTAMP</code> type. Internally, the resolution of the time portion of a + <code class="ph codeph">TIMESTAMP</code> value is in nanoseconds. + </p> + + <p class="p"> + <strong class="ph b">INTERVAL expressions:</strong> + </p> + + <p class="p"> + You can perform date arithmetic by adding or subtracting a specified number of time units, using the + <code class="ph codeph">INTERVAL</code> keyword and the <code class="ph codeph">+</code> and <code class="ph codeph">-</code> operators or + <code class="ph codeph">date_add()</code> and <code class="ph codeph">date_sub()</code> functions. You can specify units as + <code class="ph codeph">YEAR[S]</code>, <code class="ph codeph">MONTH[S]</code>, <code class="ph codeph">WEEK[S]</code>, <code class="ph codeph">DAY[S]</code>, + <code class="ph codeph">HOUR[S]</code>, <code class="ph codeph">MINUTE[S]</code>, <code class="ph codeph">SECOND[S]</code>, + <code class="ph codeph">MILLISECOND[S]</code>, <code class="ph codeph">MICROSECOND[S]</code>, and <code class="ph codeph">NANOSECOND[S]</code>. You can + only specify one time unit in each interval expression, for example <code class="ph codeph">INTERVAL 3 DAYS</code> or + <code class="ph codeph">INTERVAL 25 HOURS</code>, but you can produce any granularity by adding together successive + <code class="ph codeph">INTERVAL</code> values, such as <code class="ph codeph"><var class="keyword varname">timestamp_value</var> + INTERVAL 3 WEEKS - + INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</code>. + </p> + + <p class="p"> + For example: + </p> + +<pre class="pre codeblock"><code>select now() + interval 1 day; +select date_sub(now(), interval 5 minutes); +insert into auction_details + select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours + from new_auctions;</code></pre> + + <p class="p"> + <strong class="ph b">Time zones:</strong> + </p> + + <p class="p"> + By default, Impala does not store timestamps using the local timezone, to avoid undesired results from + unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or + read from data files, or when converted to or from Unix time values through functions such as + <code class="ph codeph">from_unixtime()</code> or <code class="ph codeph">unix_timestamp()</code>. To convert such a + <code class="ph codeph">TIMESTAMP</code> value to one that represents the date and time in a specific time zone, convert + the original value with the <code class="ph codeph">from_utc_timestamp()</code> function. + </p> + + <p class="p"> + Because Impala does not assume that <code class="ph codeph">TIMESTAMP</code> values are in any particular time zone, you + must be conscious of the time zone aspects of data that you query, insert, or convert. + </p> + + <p class="p"> + For consistency with Unix system calls, the <code class="ph codeph">TIMESTAMP</code> returned by the <code class="ph codeph">now()</code> + function represents the local time in the system time zone, rather than in UTC. To store values relative to + the current time in a portable way, convert any <code class="ph codeph">now()</code> return values using the + <code class="ph codeph">to_utc_timestamp()</code> function first. For example, the following example shows that the current + time in California (where this Impala cluster is located) is shortly after 2 PM. If that value was written to a data + file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the + dates and times would not match up precisely because of time zone differences. Therefore, the + <code class="ph codeph">to_utc_timestamp()</code> function converts it using a common reference point, the UTC time zone + (descended from the old Greenwich Mean Time standard). The <code class="ph codeph">'PDT'</code> argument indicates that the + original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all + geographic locations run the same transformation on any local date and time values (with the appropriate time + zone argument), the stored data uses a consistent representation. Impala queries can use functions such as + <code class="ph codeph">EXTRACT()</code>, <code class="ph codeph">MIN()</code>, <code class="ph codeph">AVG()</code>, and so on to do time-series + analysis on those timestamps. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select now(); ++-------------------------------+ +| now() | ++-------------------------------+ +| 2015-04-09 14:07:46.580465000 | ++-------------------------------+ +[localhost:21000] > select to_utc_timestamp(now(), 'PDT'); ++--------------------------------+ +| to_utc_timestamp(now(), 'pdt') | ++--------------------------------+ +| 2015-04-09 21:08:07.664547000 | ++--------------------------------+ +</code></pre> + + <p class="p"> + The converse function, <code class="ph codeph">from_utc_timestamp()</code>, lets you take stored <code class="ph codeph">TIMESTAMP</code> + data or calculated results and convert back to local date and time for processing on the application side. + The following example shows how you might represent some future date (such as the ending date and time of an + auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The + final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it + back to the local time zone and comparing it against the current date and time. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT'); ++---------------------------------------------------+ +| to_utc_timestamp(now() + interval 2 weeks, 'pdt') | ++---------------------------------------------------+ +| 2015-04-23 21:08:34.152923000 | ++---------------------------------------------------+ +[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT'); ++------------------------------------------------------------+ +| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') | ++------------------------------------------------------------+ +| 2015-04-23 14:08:34.152923000 | ++------------------------------------------------------------+ +[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') < now(); ++--------------------------------------------------------------------+ +| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') < now() | ++--------------------------------------------------------------------+ +| false | ++--------------------------------------------------------------------+ +</code></pre> + + <p class="p"> + If you have data files written by Hive, those <code class="ph codeph">TIMESTAMP</code> values represent the local timezone + of the host where the data was written, potentially leading to inconsistent results when processed by Impala. + To avoid compatibility problems or having to code workarounds, you can specify one or both of these + <span class="keyword cmdname">impalad</span> startup flags: <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> + <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps=true</code>. Although + <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> is turned off by default to avoid performance overhead, where practical + turn it on when processing <code class="ph codeph">TIMESTAMP</code> columns in Parquet files written by Hive, to avoid unexpected behavior. + </p> + + <p class="p"> + The <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting affects conversions from + <code class="ph codeph">TIMESTAMP</code> to <code class="ph codeph">BIGINT</code>, or from <code class="ph codeph">BIGINT</code> + to <code class="ph codeph">TIMESTAMP</code>. By default, Impala treats all <code class="ph codeph">TIMESTAMP</code> values as UTC, + to simplify analysis of time-series data from different geographic regions. When you enable the + <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting, these operations + treat the input values as if they are in the local tie zone of the host doing the processing. + See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for the list of functions + affected by the <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting. + </p> + + <p class="p"> + The following sequence of examples shows how the interpretation of <code class="ph codeph">TIMESTAMP</code> values in + Parquet tables is affected by the setting of the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> + setting. + </p> + + <p class="p"> + Regardless of the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> setting, + <code class="ph codeph">TIMESTAMP</code> columns in text tables can be written and read interchangeably by Impala and Hive: + </p> + +<pre class="pre codeblock"><code>Impala DDL and queries for text table: + +[localhost:21000] > create table t1 (x timestamp); +[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day); +[localhost:21000] > select x from t1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 15:43:02.892403000 | +| 2015-04-08 15:43:02.892403000 | ++-------------------------------+ +[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1; ++-------------------------------+ +| to_utc_timestamp(x, 'pdt') | ++-------------------------------+ +| 2015-04-07 22:43:02.892403000 | +| 2015-04-08 22:43:02.892403000 | ++-------------------------------+ + +Hive query for text table: + +hive> select * from t1; +OK +2015-04-07 15:43:02.892403 +2015-04-08 15:43:02.892403 +Time taken: 1.245 seconds, Fetched: 2 row(s) +</code></pre> + + <p class="p"> + When the table uses Parquet format, Impala expects any time zone adjustment to be applied prior to writing, + while <code class="ph codeph">TIMESTAMP</code> values written by Hive are adjusted to be in the UTC time zone. When Hive + queries Parquet data files that it wrote, it adjusts the <code class="ph codeph">TIMESTAMP</code> values back to the local + time zone, while Impala does no conversion. Hive does no time zone conversion when it queries Impala-written + Parquet files. + </p> + +<pre class="pre codeblock"><code>Impala DDL and queries for Parquet table: + +[localhost:21000] > create table p1 stored as parquet as select x from t1; ++-------------------+ +| summary | ++-------------------+ +| Inserted 2 row(s) | ++-------------------+ +[localhost:21000] > select x from p1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 15:43:02.892403000 | +| 2015-04-08 15:43:02.892403000 | ++-------------------------------+ + +Hive DDL and queries for Parquet table: + +hive> create table h1 (x timestamp) stored as parquet; +OK +hive> insert into h1 select * from p1; +... +OK +Time taken: 35.573 seconds +hive> select x from p1; +OK +2015-04-07 15:43:02.892403 +2015-04-08 15:43:02.892403 +Time taken: 0.324 seconds, Fetched: 2 row(s) +hive> select x from h1; +OK +2015-04-07 15:43:02.892403 +2015-04-08 15:43:02.892403 +Time taken: 0.197 seconds, Fetched: 2 row(s) +</code></pre> + + <p class="p"> + The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying values in the + <code class="ph codeph">TIMESTAMP</code> column are different from the ones written by Impala, even though they were copied + from one table to another by an <code class="ph codeph">INSERT ... SELECT</code> statement in Hive. Hive did an implicit + conversion from the local time zone to UTC as it wrote the values to Parquet. + </p> + +<pre class="pre codeblock"><code>Impala query for TIMESTAMP values from Impala-written and Hive-written data: + +[localhost:21000] > select * from p1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 15:43:02.892403000 | +| 2015-04-08 15:43:02.892403000 | ++-------------------------------+ +Fetched 2 row(s) in 0.29s +[localhost:21000] > select * from h1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 22:43:02.892403000 | +| 2015-04-08 22:43:02.892403000 | ++-------------------------------+ +Fetched 2 row(s) in 0.41s + +Underlying integer values for Impala-written and Hive-written data: + +[localhost:21000] > select cast(x as bigint) from p1; ++-------------------+ +| cast(x as bigint) | ++-------------------+ +| 1428421382 | +| 1428507782 | ++-------------------+ +Fetched 2 row(s) in 0.38s +[localhost:21000] > select cast(x as bigint) from h1; ++-------------------+ +| cast(x as bigint) | ++-------------------+ +| 1428446582 | +| 1428532982 | ++-------------------+ +Fetched 2 row(s) in 0.20s +</code></pre> + + <p class="p"> + When the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> setting is enabled, Impala recognizes + the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during + the query as Hive uses, making the contents of the Impala-written <code class="ph codeph">P1</code> table and the + Hive-written <code class="ph codeph">H1</code> table appear identical, whether represented as <code class="ph codeph">TIMESTAMP</code> + values or the underlying <code class="ph codeph">BIGINT</code> integers: + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select x from p1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 15:43:02.892403000 | +| 2015-04-08 15:43:02.892403000 | ++-------------------------------+ +Fetched 2 row(s) in 0.37s +[localhost:21000] > select x from h1; ++-------------------------------+ +| x | ++-------------------------------+ +| 2015-04-07 15:43:02.892403000 | +| 2015-04-08 15:43:02.892403000 | ++-------------------------------+ +Fetched 2 row(s) in 0.19s +[localhost:21000] > select cast(x as bigint) from p1; ++-------------------+ +| cast(x as bigint) | ++-------------------+ +| 1428446582 | +| 1428532982 | ++-------------------+ +Fetched 2 row(s) in 0.29s +[localhost:21000] > select cast(x as bigint) from h1; ++-------------------+ +| cast(x as bigint) | ++-------------------+ +| 1428446582 | +| 1428532982 | ++-------------------+ +Fetched 2 row(s) in 0.22s +</code></pre> + + <p class="p"> + <strong class="ph b">Conversions:</strong> + </p> + + <p class="p"> + Impala automatically converts <code class="ph codeph">STRING</code> literals of the correct format into + <code class="ph codeph">TIMESTAMP</code> values. Timestamp values are accepted in the format + <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, and can consist of just the date, or just the time, with or + without the fractional second portion. For example, you can specify <code class="ph codeph">TIMESTAMP</code> values such as + <code class="ph codeph">'1966-07-30'</code>, <code class="ph codeph">'08:30:00'</code>, or <code class="ph codeph">'1985-09-25 17:45:30.005'</code>. + <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to + <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch + date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. + If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect, + the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span> + </p> + + <p class="p"> + In Impala 1.3 and higher, the <code class="ph codeph">FROM_UNIXTIME()</code> and <code class="ph codeph">UNIX_TIMESTAMP()</code> + functions allow a wider range of format strings, with more flexibility in element order, repetition of letter + placeholders, and separator characters. In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">UNIX_TIMESTAMP()</code> + function also allows a numeric timezone offset to be specified as part of the input string. + See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details. + </p> + + <p class="p"> + In Impala 2.2.0 and higher, built-in functions that accept or return integers representing <code class="ph codeph">TIMESTAMP</code> values + use the <code class="ph codeph">BIGINT</code> type for parameters and return values, rather than <code class="ph codeph">INT</code>. + This change lets the date and time functions avoid an overflow error that would otherwise occur + on January 19th, 2038 (known as the + <a class="xref" href="http://en.wikipedia.org/wiki/Year_2038_problem" target="_blank"><span class="q">"Year 2038 problem"</span> or <span class="q">"Y2K38 problem"</span></a>). + This change affects the <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code> functions. + You might need to change application code that interacts with these functions, change the types of + columns that store the return values, or add <code class="ph codeph">CAST()</code> calls to SQL statements that + call these functions. + </p> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> + </p> + + <p class="p"> + Although you cannot use a <code class="ph codeph">TIMESTAMP</code> column as a partition key, you can extract the + individual years, months, days, hours, and so on and partition based on those columns. Because the partition + key column values are represented in HDFS directory names, rather than as fields in the data files + themselves, you can also keep the original <code class="ph codeph">TIMESTAMP</code> values if desired, without duplicating + data or wasting storage space. See <a class="xref" href="impala_partitioning.html#partition_key_columns">Partition Key Columns</a> for more + details on partitioning with date and time values. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp); +ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened +</code></pre> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>select cast('1966-07-30' as timestamp); +select cast('1985-09-25 17:45:30.005' as timestamp); +select cast('08:30:00' as timestamp); +select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15. +select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required. +select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range. +select dayofweek('2004-06-13'); -- Returns 1, representing Sunday. +select dayname('2004-06-13'); -- Returns 'Sunday'. +select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields. +select day('2004-06-13'); -- Returns 13. +select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates? +select now(); -- Returns current date and time in local timezone. + +create table dates_and_times (t timestamp); +insert into dates_and_times values + ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now()); +</code></pre> + + <p class="p"> + <strong class="ph b">NULL considerations:</strong> Casting any unrecognized <code class="ph codeph">STRING</code> value to this type produces a + <code class="ph codeph">NULL</code> value. + </p> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> Because this type potentially has so many distinct values, it is often not a sensible + choice for a partition key column. For example, events 1 millisecond apart would be stored in different + partitions. Consider using the <code class="ph codeph">TRUNC()</code> function to condense the number of distinct values, + and partition on a new column with the truncated values. + </p> + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables. + </p> + + <p class="p"> + <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables. + </p> + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + + + <p class="p"> + <strong class="ph b">Internal details:</strong> Represented in memory as a 16-byte value. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> Available in all versions of Impala. + </p> + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size + fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code> + statement. + </p> + + <p class="p"> + <strong class="ph b">Sqoop considerations:</strong> + </p> + + <p class="p"> If you use Sqoop to + convert RDBMS data to Parquet, be careful with interpreting any + resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, + or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are + represented as the Parquet <code class="ph codeph">INT64</code> type, which is + represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet + values represent the time in milliseconds, while Impala interprets + <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have + a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported + this way from Sqoop, divide the values by 1000 when interpreting as the + <code class="ph codeph">TIMESTAMP</code> type.</p> + + <p class="p"> + <strong class="ph b">Restrictions:</strong> + </p> + + <p class="p"> + If you cast a <code class="ph codeph">STRING</code> with an unrecognized format to a <code class="ph codeph">TIMESTAMP</code>, the result + is <code class="ph codeph">NULL</code> rather than an error. Make sure to test your data pipeline to be sure any textual + date and time values are in a format that Impala <code class="ph codeph">TIMESTAMP</code> can recognize. + </p> + + <p class="p"> + Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and + time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the + values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function, + or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code> + function. + </p> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + <p class="p"> + Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>, + <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <ul class="ul"> + <li class="li"> + + <a class="xref" href="impala_literals.html#timestamp_literals">Timestamp Literals</a>. + </li> + + <li class="li"> + To convert to or from different date formats, or perform date arithmetic, use the date and time functions + described in <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. In particular, the + <code class="ph codeph">from_unixtime()</code> function requires a case-sensitive format string such as + <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSS"</code>, matching one of the allowed variations of a + <code class="ph codeph">TIMESTAMP</code> value (date plus time, only date, only time, optional fractional seconds). + </li> + + <li class="li"> + See <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a> for details about differences in + <code class="ph codeph">TIMESTAMP</code> handling between Impala and Hive. + </li> + </ul> + + </div> + +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_tinyint.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_tinyint.html b/docs/build/html/topics/impala_tinyint.html new file mode 100644 index 0000000..9efc098 --- /dev/null +++ b/docs/build/html/topics/impala_tinyint.html @@ -0,0 +1,131 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tinyint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TINYINT Data Type</title></head><body id="tinyint"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">TINYINT Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + A 1-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> TINYINT</code></pre> + + <p class="p"> + <strong class="ph b">Range:</strong> -128 .. 127. There is no <code class="ph codeph">UNSIGNED</code> subtype. + </p> + + <p class="p"> + <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">SMALLINT</code>, + <code class="ph codeph">INT</code>, or <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or + <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">STRING</code> or + <code class="ph codeph">TIMESTAMP</code>. + <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to + <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch + date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. + If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect, + the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span> + </p> + + <p class="p"> + Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish + between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional + database systems. Impala returns the largest or smallest value in the range for the type. For example, + valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code> + with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a + value of 200 returns 127. + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + For a convenient and automated way to check the bounds of the <code class="ph codeph">TINYINT</code> type, call the + functions <code class="ph codeph">MIN_TINYINT()</code> and <code class="ph codeph">MAX_TINYINT()</code>. + </p> + + <p class="p"> + If an integer value is too large to be represented as a <code class="ph codeph">TINYINT</code>, use a + <code class="ph codeph">SMALLINT</code> instead. + </p> + + <p class="p"> + <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code> + value. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>CREATE TABLE t1 (x TINYINT); +SELECT CAST(100 AS TINYINT); +</code></pre> + + <p class="p"> + <strong class="ph b">Parquet considerations:</strong> + </p> + + + + <p class="p"> + Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit + integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a + new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or + <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new + table. + </p> + + + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables. + </p> + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + + + <p class="p"> + <strong class="ph b">Internal details:</strong> Represented in memory as a 1-byte value. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> Available in all versions of Impala. + </p> + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size + fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code> + statement. + </p> + + + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, + <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>, + <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>, + <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file
