http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_avro.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_avro.html b/docs/build/html/topics/impala_avro.html new file mode 100644 index 0000000..fd38294 --- /dev/null +++ b/docs/build/html/topics/impala_avro.html @@ -0,0 +1,565 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Avro File Format with Impala Tables</title></head><body id="avro"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Using the Avro File Format with Impala Tables</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Impala supports using tables whose data files use the Avro file format. Impala can query Avro + tables, and in Impala 1.4.0 and higher can create them, but currently cannot insert data into them. For + insert operations, use Hive, then switch back to Impala to run queries. + </p> + + <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Avro Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead"> + <tr class="row"> + <th class="entry nocellnorowborder" id="avro__entry__1"> + File Type + </th> + <th class="entry nocellnorowborder" id="avro__entry__2"> + Format + </th> + <th class="entry nocellnorowborder" id="avro__entry__3"> + Compression Codecs + </th> + <th class="entry nocellnorowborder" id="avro__entry__4"> + Impala Can CREATE? + </th> + <th class="entry nocellnorowborder" id="avro__entry__5"> + Impala Can INSERT? + </th> + </tr> + </thead><tbody class="tbody"> + <tr class="row"> + <td class="entry nocellnorowborder" headers="avro__entry__1 "> + <a class="xref" href="impala_avro.html#avro">Avro</a> + </td> + <td class="entry nocellnorowborder" headers="avro__entry__2 "> + Structured + </td> + <td class="entry nocellnorowborder" headers="avro__entry__3 "> + Snappy, gzip, deflate, bzip2 + </td> + <td class="entry nocellnorowborder" headers="avro__entry__4 "> + Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive. + </td> + <td class="entry nocellnorowborder" headers="avro__entry__5 "> + No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use + <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala. + </td> + + </tr> + </tbody></table> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="avro__avro_create_table"> + + <h2 class="title topictitle2" id="ariaid-title2">Creating Avro Tables</h2> + + <div class="body conbody"> + + <p class="p"> + To create a new table using the Avro file format, issue the <code class="ph codeph">CREATE TABLE</code> statement through + Impala with the <code class="ph codeph">STORED AS AVRO</code> clause, or through Hive. If you create the table through + Impala, you must include column definitions that match the fields specified in the Avro schema. With Hive, + you can omit the columns and just specify the Avro schema. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">CREATE TABLE</code> for Avro tables can include + SQL-style column definitions rather than specifying Avro notation through the <code class="ph codeph">TBLPROPERTIES</code> + clause. Impala issues warning messages if there are any mismatches between the types specified in the + SQL column definitions and the underlying types; for example, any <code class="ph codeph">TINYINT</code> or + <code class="ph codeph">SMALLINT</code> columns are treated as <code class="ph codeph">INT</code> in the underlying Avro files, + and therefore are displayed as <code class="ph codeph">INT</code> in any <code class="ph codeph">DESCRIBE</code> or + <code class="ph codeph">SHOW CREATE TABLE</code> output. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and + time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the + values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function, + or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code> + function. + </p> + </div> + + + + <p class="p"> + The following examples demonstrate creating an Avro table in Impala, using either an inline column + specification or one taken from a JSON file stored in HDFS: + </p> + +<pre class="pre codeblock"><code> +[localhost:21000] > CREATE TABLE avro_only_sql_columns + > ( + > id INT, + > bool_col BOOLEAN, + > tinyint_col TINYINT, /* Gets promoted to INT */ + > smallint_col SMALLINT, /* Gets promoted to INT */ + > int_col INT, + > bigint_col BIGINT, + > float_col FLOAT, + > double_col DOUBLE, + > date_string_col STRING, + > string_col STRING + > ) + > STORED AS AVRO; + +[localhost:21000] > CREATE TABLE impala_avro_table + > (bool_col BOOLEAN, int_col INT, long_col BIGINT, float_col FLOAT, double_col DOUBLE, string_col STRING, nullable_int INT) + > STORED AS AVRO + > TBLPROPERTIES ('avro.schema.literal'='{ + > "name": "my_record", + > "type": "record", + > "fields": [ + > {"name":"bool_col", "type":"boolean"}, + > {"name":"int_col", "type":"int"}, + > {"name":"long_col", "type":"long"}, + > {"name":"float_col", "type":"float"}, + > {"name":"double_col", "type":"double"}, + > {"name":"string_col", "type":"string"}, + > {"name": "nullable_int", "type": ["null", "int"]}]}'); + +[localhost:21000] > CREATE TABLE avro_examples_of_all_types ( + > id INT, + > bool_col BOOLEAN, + > tinyint_col TINYINT, + > smallint_col SMALLINT, + > int_col INT, + > bigint_col BIGINT, + > float_col FLOAT, + > double_col DOUBLE, + > date_string_col STRING, + > string_col STRING + > ) + > STORED AS AVRO + > TBLPROPERTIES ('avro.schema.url'='hdfs://localhost:8020/avro_schemas/alltypes.json'); + +</code></pre> + + <p class="p"> + The following example demonstrates creating an Avro table in Hive: + </p> + +<pre class="pre codeblock"><code> +hive> CREATE TABLE hive_avro_table + > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + > TBLPROPERTIES ('avro.schema.literal'='{ + > "name": "my_record", + > "type": "record", + > "fields": [ + > {"name":"bool_col", "type":"boolean"}, + > {"name":"int_col", "type":"int"}, + > {"name":"long_col", "type":"long"}, + > {"name":"float_col", "type":"float"}, + > {"name":"double_col", "type":"double"}, + > {"name":"string_col", "type":"string"}, + > {"name": "nullable_int", "type": ["null", "int"]}]}'); + +</code></pre> + + <p class="p"> + Each field of the record becomes a column of the table. Note that any other information, such as the record + name, is ignored. + </p> + + + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + For nullable Avro columns, make sure to put the <code class="ph codeph">"null"</code> entry before the actual type name. + In Impala, all columns are nullable; Impala currently does not have a <code class="ph codeph">NOT NULL</code> clause. Any + non-nullable property is only enforced on the Avro side. + </div> + + <p class="p"> + Most column types map directly from Avro to Impala under the same names. These are the exceptions and + special cases to consider: + </p> + + <ul class="ul"> + <li class="li"> + The <code class="ph codeph">DECIMAL</code> type is defined in Avro as a <code class="ph codeph">BYTE</code> type with the + <code class="ph codeph">logicalType</code> property set to <code class="ph codeph">"decimal"</code> and a specified precision and + scale. + </li> + + <li class="li"> + The Avro <code class="ph codeph">long</code> type maps to <code class="ph codeph">BIGINT</code> in Impala. + </li> + </ul> + + <p class="p"> + If you create the table through Hive, switch back to <span class="keyword cmdname">impala-shell</span> and issue an + <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. Then you can run queries for + that table through <span class="keyword cmdname">impala-shell</span>. + </p> + + <div class="p"> + In rare instances, a mismatch could occur between the Avro schema and the column definitions in the + metastore database. In <span class="keyword">Impala 2.3</span> and higher, Impala checks for such inconsistencies during + a <code class="ph codeph">CREATE TABLE</code> statement and each time it loads the metadata for a table (for example, + after <code class="ph codeph">INVALIDATE METADATA</code>). Impala uses the following rules to determine how to treat + mismatching columns, a process known as <dfn class="term">schema reconciliation</dfn>: + <ul class="ul"> + <li class="li"> + If there is a mismatch in the number of columns, Impala uses the column + definitions from the Avro schema. + </li> + <li class="li"> + If there is a mismatch in column name or type, Impala uses the column definition from the Avro schema. + Because a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> column in Impala maps to an Avro <code class="ph codeph">STRING</code>, + this case is not considered a mismatch and the column is preserved as <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> + in the reconciled schema. <span class="ph">Prior to <span class="keyword">Impala 2.7</span> the column + name and comment for such <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns was also taken from the SQL column definition. + In <span class="keyword">Impala 2.7</span> and higher, the column name and comment from the Avro schema file take precedence for such columns, + and only the <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type is preserved from the SQL column definition.</span> + </li> + <li class="li"> + An Impala <code class="ph codeph">TIMESTAMP</code> column definition maps to an Avro <code class="ph codeph">STRING</code> and is presented as a <code class="ph codeph">STRING</code> + in the reconciled schema, because Avro has no binary <code class="ph codeph">TIMESTAMP</code> representation. + As a result, no Avro table can have a <code class="ph codeph">TIMESTAMP</code> column; this restriction is the same as + in earlier Impala releases. + </li> + </ul> + </div> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + Although you can create tables in this file format using + the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, + and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher, + currently, Impala can query these types only in Parquet tables. + <span class="ph"> + The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types. + Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher. + </span> + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="avro__avro_map_table"> + + <h2 class="title topictitle2" id="ariaid-title3">Using a Hive-Created Avro Table in Impala</h2> + + <div class="body conbody"> + + <div class="p"> + If you have an Avro table created through Hive, you can use it in Impala as long as it contains only + Impala-compatible data types. It cannot contain: + <ul class="ul"> + <li class="li"> + Complex types: <code class="ph codeph">array</code>, <code class="ph codeph">map</code>, <code class="ph codeph">record</code>, + <code class="ph codeph">struct</code>, <code class="ph codeph">union</code> other than + <code class="ph codeph">[<var class="keyword varname">supported_type</var>,null]</code> or + <code class="ph codeph">[null,<var class="keyword varname">supported_type</var>]</code> + </li> + + <li class="li"> + The Avro-specific types <code class="ph codeph">enum</code>, <code class="ph codeph">bytes</code>, and <code class="ph codeph">fixed</code> + </li> + + <li class="li"> + Any scalar type other than those listed in <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> + </li> + </ul> + Because Impala and Hive share the same metastore database, Impala can directly access the table definitions + and data for tables that were created in Hive. + </div> + + <p class="p"> + If you create an Avro table in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code> the next time you + connect to Impala through <span class="keyword cmdname">impala-shell</span>. This is a one-time operation to make Impala + aware of the new table. You can issue the statement while connected to any Impala node, and the catalog + service broadcasts the change to all other Impala nodes. + </p> + + <p class="p"> + If you load new data into an Avro table through Hive, either through a Hive <code class="ph codeph">LOAD DATA</code> or + <code class="ph codeph">INSERT</code> statement, or by manually copying or moving files into the data directory for the + table, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement the next time you connect + to Impala through <span class="keyword cmdname">impala-shell</span>. You can issue the statement while connected to any + Impala node, and the catalog service broadcasts the change to all other Impala nodes. If you issue the + <code class="ph codeph">LOAD DATA</code> statement through Impala, you do not need a <code class="ph codeph">REFRESH</code> afterward. + </p> + + <p class="p"> + Impala only supports fields of type <code class="ph codeph">boolean</code>, <code class="ph codeph">int</code>, <code class="ph codeph">long</code>, + <code class="ph codeph">float</code>, <code class="ph codeph">double</code>, and <code class="ph codeph">string</code>, or unions of these types with + null; for example, <code class="ph codeph">["string", "null"]</code>. Unions with <code class="ph codeph">null</code> essentially + create a nullable type. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="avro__avro_json"> + + <h2 class="title topictitle2" id="ariaid-title4">Specifying the Avro Schema through JSON</h2> + + <div class="body conbody"> + + <p class="p"> + While you can embed a schema directly in your <code class="ph codeph">CREATE TABLE</code> statement, as shown above, + column width restrictions in the Hive metastore limit the length of schema you can specify. If you + encounter problems with long schema literals, try storing your schema as a <code class="ph codeph">JSON</code> file in + HDFS instead. Specify your schema in HDFS using table properties similar to the following: + </p> + +<pre class="pre codeblock"><code>tblproperties ('avro.schema.url'='hdfs//your-name-node:port/path/to/schema.json');</code></pre> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="avro__avro_load_data"> + + <h2 class="title topictitle2" id="ariaid-title5">Loading Data into an Avro Table</h2> + + + <div class="body conbody"> + + <p class="p"> + Currently, Impala cannot write Avro data files. Therefore, an Avro table cannot be used as the destination + of an Impala <code class="ph codeph">INSERT</code> statement or <code class="ph codeph">CREATE TABLE AS SELECT</code>. + </p> + + <p class="p"> + To copy data from another table, issue any <code class="ph codeph">INSERT</code> statements through Hive. For information + about loading data into Avro tables through Hive, see + <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe" target="_blank">Avro + page on the Hive wiki</a>. + </p> + + <p class="p"> + If you already have data files in Avro format, you can also issue <code class="ph codeph">LOAD DATA</code> in either + Impala or Hive. Impala can move existing Avro data files into an Avro table, it just cannot create new + Avro data files. + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="avro__avro_compression"> + + <h2 class="title topictitle2" id="ariaid-title6">Enabling Compression for Avro Tables</h2> + + + <div class="body conbody"> + + <p class="p"> + + To enable compression for Avro tables, specify settings in the Hive shell to enable compression and to + specify a codec, then issue a <code class="ph codeph">CREATE TABLE</code> statement as in the preceding examples. Impala + supports the <code class="ph codeph">snappy</code> and <code class="ph codeph">deflate</code> codecs for Avro tables. + </p> + + <p class="p"> + For example: + </p> + +<pre class="pre codeblock"><code>hive> set hive.exec.compress.output=true; +hive> set avro.output.codec=snappy;</code></pre> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="avro__avro_schema_evolution"> + + <h2 class="title topictitle2" id="ariaid-title7">How Impala Handles Avro Schema Evolution</h2> + + + <div class="body conbody"> + + <p class="p"> + Starting in Impala 1.1, Impala can deal with Avro data files that employ <dfn class="term">schema evolution</dfn>, + where different data files within the same table use slightly different type definitions. (You would + perform the schema evolution operation by issuing an <code class="ph codeph">ALTER TABLE</code> statement in the Hive + shell.) The old and new types for any changed columns must be compatible, for example a column might start + as an <code class="ph codeph">int</code> and later change to a <code class="ph codeph">bigint</code> or <code class="ph codeph">float</code>. + </p> + + <p class="p"> + As with any other tables where the definitions are changed or data is added outside of the current + <span class="keyword cmdname">impalad</span> node, ensure that Impala loads the latest metadata for the table if the Avro + schema is modified through Hive. Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or + <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. <code class="ph codeph">REFRESH</code> + reloads the metadata immediately, <code class="ph codeph">INVALIDATE METADATA</code> reloads the metadata the next time + the table is accessed. + </p> + + <p class="p"> + When Avro data files or columns are not consulted during a query, Impala does not check for consistency. + Thus, if you issue <code class="ph codeph">SELECT c1, c2 FROM t1</code>, Impala does not return any error if the column + <code class="ph codeph">c3</code> changed in an incompatible way. If a query retrieves data from some partitions but not + others, Impala does not check the data files for the unused partitions. + </p> + + <p class="p"> + In the Hive DDL statements, you can specify an <code class="ph codeph">avro.schema.literal</code> table property (if the + schema definition is short) or an <code class="ph codeph">avro.schema.url</code> property (if the schema definition is + long, or to allow convenient editing for the definition). + </p> + + <p class="p"> + For example, running the following SQL code in the Hive shell creates a table using the Avro file format + and puts some sample data into it: + </p> + +<pre class="pre codeblock"><code>CREATE TABLE avro_table (a string, b string) +ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' +STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' +OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' +TBLPROPERTIES ( + 'avro.schema.literal'='{ + "type": "record", + "name": "my_record", + "fields": [ + {"name": "a", "type": "int"}, + {"name": "b", "type": "string"} + ]}'); + +INSERT OVERWRITE TABLE avro_table SELECT 1, "avro" FROM functional.alltypes LIMIT 1; +</code></pre> + + <p class="p"> + Once the Avro table is created and contains data, you can query it through the + <span class="keyword cmdname">impala-shell</span> command: + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > select * from avro_table; ++---+------+ +| a | b | ++---+------+ +| 1 | avro | ++---+------+ +</code></pre> + + <p class="p"> + Now in the Hive shell, you change the type of a column and add a new column with a default value: + </p> + +<pre class="pre codeblock"><code>-- Promote column "a" from INT to FLOAT (no need to update Avro schema) +ALTER TABLE avro_table CHANGE A A FLOAT; + +-- Add column "c" with default +ALTER TABLE avro_table ADD COLUMNS (c int); +ALTER TABLE avro_table SET TBLPROPERTIES ( + 'avro.schema.literal'='{ + "type": "record", + "name": "my_record", + "fields": [ + {"name": "a", "type": "int"}, + {"name": "b", "type": "string"}, + {"name": "c", "type": "int", "default": 10} + ]}'); +</code></pre> + + <p class="p"> + Once again in <span class="keyword cmdname">impala-shell</span>, you can query the Avro table based on its latest schema + definition. Because the table metadata was changed outside of Impala, you issue a <code class="ph codeph">REFRESH</code> + statement first so that Impala has up-to-date metadata for the table. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > refresh avro_table; +[localhost:21000] > select * from avro_table; ++---+------+----+ +| a | b | c | ++---+------+----+ +| 1 | avro | 10 | ++---+------+----+ +</code></pre> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="avro__avro_data_types"> + + <h2 class="title topictitle2" id="ariaid-title8">Data Type Considerations for Avro Tables</h2> + + <div class="body conbody"> + + <p class="p"> + The Avro format defines a set of data types whose names differ from the names of the corresponding Impala + data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you + might need to work with the type names defined by Avro. The following figure lists the Avro-defined types + and the equivalent types in Impala. + </p> + +<pre class="pre codeblock"><code>Primitive Types (Avro -> Impala) +-------------------------------- +STRING -> STRING +STRING -> CHAR +STRING -> VARCHAR +INT -> INT +BOOLEAN -> BOOLEAN +LONG -> BIGINT +FLOAT -> FLOAT +DOUBLE -> DOUBLE + +Logical Types +------------- +BYTES + logicalType = "decimal" -> DECIMAL + +Avro Types with No Impala Equivalent +------------------------------------ +RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL + +Impala Types with No Avro Equivalent +------------------------------------ +TIMESTAMP + +</code></pre> + + <p class="p"> + The Avro specification allows string values up to 2**64 bytes in length. + Impala queries for Avro tables use 32-bit integers to hold string lengths. + In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code> + and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes. + If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1 + bytes in an Avro table, the query fails. In earlier releases, + encountering such long values in an Avro table could cause a crash. + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="avro__avro_performance"> + + <h2 class="title topictitle2" id="ariaid-title9">Query Performance for Impala Avro Tables</h2> + + <div class="body conbody"> + + <p class="p"> + In general, expect query performance with Avro tables to be + faster than with tables using text data, but slower than with + Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> + for information about using the Parquet file format for + high-performance analytic queries. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3. + For Impala tables that use the file formats Parquet, RCFile, SequenceFile, + Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code> + in the <span class="ph filepath">core-site.xml</span> configuration file determines + how Impala divides the I/O work of reading the data files. This configuration + setting is specified in bytes. By default, this + value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files + as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access + Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> + to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve + Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> + to 268435456 (256 MB) to match the row group size produced by Impala. + </p> + + </div> + </article> + +</article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_batch_size.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_batch_size.html b/docs/build/html/topics/impala_batch_size.html new file mode 100644 index 0000000..52ceff0 --- /dev/null +++ b/docs/build/html/topics/impala_batch_size.html @@ -0,0 +1,29 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="batch_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BATCH_SIZE Query Option</title></head><body id="batch_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">BATCH_SIZE Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default + size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint. + </p> + + <p class="p"> + This option is primarily for testing during Impala development, or for use under the direction of <span class="keyword">the appropriate support channel</span>. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> numeric + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> 0 (meaning the predefined default of 1024) + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_bigint.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_bigint.html b/docs/build/html/topics/impala_bigint.html new file mode 100644 index 0000000..d0f9c2c --- /dev/null +++ b/docs/build/html/topics/impala_bigint.html @@ -0,0 +1,136 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bigint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BIGINT Data Type</title></head><body id="bigint"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">BIGINT Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + An 8-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> + statements. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BIGINT</code></pre> + + <p class="p"> + <strong class="ph b">Range:</strong> -9223372036854775808 .. 9223372036854775807. There is no <code class="ph codeph">UNSIGNED</code> subtype. + </p> + + <p class="p"> + <strong class="ph b">Conversions:</strong> Impala automatically converts to a floating-point type (<code class="ph codeph">FLOAT</code> or + <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, + <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>. + <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to + <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch + date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. + If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect, + the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span> + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>CREATE TABLE t1 (x BIGINT); +SELECT CAST(1000 AS BIGINT); +</code></pre> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + <code class="ph codeph">BIGINT</code> is a convenient type to use for column declarations because you can use any kind of + integer values in <code class="ph codeph">INSERT</code> statements and they are promoted to <code class="ph codeph">BIGINT</code> where + necessary. However, <code class="ph codeph">BIGINT</code> also requires the most bytes of any integer type on disk and in + memory, meaning your queries are not as efficient and scalable as possible if you overuse this type. + Therefore, prefer to use the smallest integer type with sufficient range to hold all input values, and + <code class="ph codeph">CAST()</code> when necessary to the appropriate type. + </p> + + <p class="p"> + For a convenient and automated way to check the bounds of the <code class="ph codeph">BIGINT</code> type, call the + functions <code class="ph codeph">MIN_BIGINT()</code> and <code class="ph codeph">MAX_BIGINT()</code>. + </p> + + <p class="p"> + If an integer value is too large to be represented as a <code class="ph codeph">BIGINT</code>, use a + <code class="ph codeph">DECIMAL</code> instead with sufficient digits of precision. + </p> + + <p class="p"> + <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code> + value. + </p> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric + type more efficiently than a <code class="ph codeph">STRING</code> representation of the value. + </p> + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables. + </p> + + + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + + + <p class="p"> + <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value. + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> Available in all versions of Impala. + </p> + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size + fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code> + statement. + </p> + + <p class="p"> + <strong class="ph b">Sqoop considerations:</strong> + </p> + + <p class="p"> If you use Sqoop to + convert RDBMS data to Parquet, be careful with interpreting any + resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, + or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are + represented as the Parquet <code class="ph codeph">INT64</code> type, which is + represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet + values represent the time in milliseconds, while Impala interprets + <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have + a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported + this way from Sqoop, divide the values by 1000 when interpreting as the + <code class="ph codeph">TIMESTAMP</code> type.</p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, + <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>, + <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>, + <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_bit_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_bit_functions.html b/docs/build/html/topics/impala_bit_functions.html new file mode 100644 index 0000000..80d9f55 --- /dev/null +++ b/docs/build/html/topics/impala_bit_functions.html @@ -0,0 +1,848 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bit_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Bit Functions</title></head><body id="bit_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Bit Functions</h1> + + + + <div class="body conbody"> + + <p class="p"> + Bit manipulation functions perform bitwise operations involved in scientific processing or computer science algorithms. + For example, these functions include setting, clearing, or testing bits within an integer value, or changing the + positions of bits with or without wraparound. + </p> + + <p class="p"> + If a function takes two integer arguments that are required to be of the same type, the smaller argument is promoted + to the type of the larger one if required. For example, <code class="ph codeph">BITAND(1,4096)</code> treats both arguments as + <code class="ph codeph">SMALLINT</code>, because 1 can be represented as a <code class="ph codeph">TINYINT</code> but 4096 requires a <code class="ph codeph">SMALLINT</code>. + </p> + + <p class="p"> + Remember that all Impala integer values are signed. Therefore, when dealing with binary values where the most significant + bit is 1, the specified or returned values might be negative when represented in base 10. + </p> + + <p class="p"> + Whenever any argument is <code class="ph codeph">NULL</code>, either the input value, bit position, or number of shift or rotate positions, + the return value from any of these functions is also <code class="ph codeph">NULL</code> + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + The bit functions operate on all the integral data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>, + <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, and + <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>. + </p> + + <p class="p"> + <strong class="ph b">Function reference:</strong> + </p> + + <p class="p"> + Impala supports the following bit functions: + </p> + + + + <dl class="dl"> + + + + <dt class="dt dlterm" id="bit_functions__bitand"> + <code class="ph codeph">bitand(integer_type a, same_type b)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in both of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p class="p"> + <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitand()</code> function is equivalent to the <code class="ph codeph">&</code> binary operator. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show the results of ANDing integer values. + 255 contains all 1 bits in its lowermost 7 bits. + 32767 contains all 1 bits in its lowermost 15 bits. + + You can use the <code class="ph codeph">bin()</code> function to check the binary representation of any + integer value, although the result is always represented as a 64-bit value. + If necessary, the smaller argument is promoted to the + type of the larger one. + </p> +<pre class="pre codeblock"><code>select bitand(255, 32767); /* 0000000011111111 & 0111111111111111 */ ++--------------------+ +| bitand(255, 32767) | ++--------------------+ +| 255 | ++--------------------+ + +select bitand(32767, 1); /* 0111111111111111 & 0000000000000001 */ ++------------------+ +| bitand(32767, 1) | ++------------------+ +| 1 | ++------------------+ + +select bitand(32, 16); /* 00010000 & 00001000 */ ++----------------+ +| bitand(32, 16) | ++----------------+ +| 0 | ++----------------+ + +select bitand(12,5); /* 00001100 & 00000101 */ ++---------------+ +| bitand(12, 5) | ++---------------+ +| 4 | ++---------------+ + +select bitand(-1,15); /* 11111111 & 00001111 */ ++----------------+ +| bitand(-1, 15) | ++----------------+ +| 15 | ++----------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__bitnot"> + <code class="ph codeph">bitnot(integer_type a)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Inverts all the bits of the input argument. + <p class="p"> + <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitnot()</code> function is equivalent to the <code class="ph codeph">~</code> unary operator. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + These examples illustrate what happens when you flip all the bits of an integer value. + The sign always changes. The decimal representation is one different between the positive and + negative values. + + </p> +<pre class="pre codeblock"><code>select bitnot(127); /* 01111111 -> 10000000 */ ++-------------+ +| bitnot(127) | ++-------------+ +| -128 | ++-------------+ + +select bitnot(16); /* 00010000 -> 11101111 */ ++------------+ +| bitnot(16) | ++------------+ +| -17 | ++------------+ + +select bitnot(0); /* 00000000 -> 11111111 */ ++-----------+ +| bitnot(0) | ++-----------+ +| -1 | ++-----------+ + +select bitnot(-128); /* 10000000 -> 01111111 */ ++--------------+ +| bitnot(-128) | ++--------------+ +| 127 | ++--------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__bitor"> + <code class="ph codeph">bitor(integer_type a, same_type b)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in either of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p class="p"> + <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitor()</code> function is equivalent to the <code class="ph codeph">|</code> binary operator. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show the results of ORing integer values. + </p> +<pre class="pre codeblock"><code>select bitor(1,4); /* 00000001 | 00000100 */ ++-------------+ +| bitor(1, 4) | ++-------------+ +| 5 | ++-------------+ + +select bitor(16,48); /* 00001000 | 00011000 */ ++---------------+ +| bitor(16, 48) | ++---------------+ +| 48 | ++---------------+ + +select bitor(0,7); /* 00000000 | 00000111 */ ++-------------+ +| bitor(0, 7) | ++-------------+ +| 7 | ++-------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__bitxor"> + <code class="ph codeph">bitxor(integer_type a, same_type b)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in one but not both of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p class="p"> + <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitxor()</code> function is equivalent to the <code class="ph codeph">^</code> binary operator. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show the results of XORing integer values. + XORing a non-zero value with zero returns the non-zero value. + XORing two identical values returns zero, because all the 1 bits from the first argument are also 1 bits in the second argument. + XORing different non-zero values turns off some bits and leaves others turned on, based on whether the same bit is set in both arguments. + </p> +<pre class="pre codeblock"><code>select bitxor(0,15); /* 00000000 ^ 00001111 */ ++---------------+ +| bitxor(0, 15) | ++---------------+ +| 15 | ++---------------+ + +select bitxor(7,7); /* 00000111 ^ 00000111 */ ++--------------+ +| bitxor(7, 7) | ++--------------+ +| 0 | ++--------------+ + +select bitxor(8,4); /* 00001000 ^ 00000100 */ ++--------------+ +| bitxor(8, 4) | ++--------------+ +| 12 | ++--------------+ + +select bitxor(3,7); /* 00000011 ^ 00000111 */ ++--------------+ +| bitxor(3, 7) | ++--------------+ +| 4 | ++--------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__countset"> + <code class="ph codeph">countset(integer_type a [, int zero_or_one])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> By default, returns the number of 1 bits in the specified integer value. + If the optional second argument is set to zero, it returns the number of 0 bits instead. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + In discussions of information theory, this operation is referred to as the + <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Hamming_weight" target="_blank">population count</a>"</span> + or <span class="q">"popcount"</span>. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show how to count the number of 1 bits in an integer value. + </p> +<pre class="pre codeblock"><code>select countset(1); /* 00000001 */ ++-------------+ +| countset(1) | ++-------------+ +| 1 | ++-------------+ + +select countset(3); /* 00000011 */ ++-------------+ +| countset(3) | ++-------------+ +| 2 | ++-------------+ + +select countset(16); /* 00010000 */ ++--------------+ +| countset(16) | ++--------------+ +| 1 | ++--------------+ + +select countset(17); /* 00010001 */ ++--------------+ +| countset(17) | ++--------------+ +| 2 | ++--------------+ + +select countset(7,1); /* 00000111 = 3 1 bits; the function counts 1 bits by default */ ++----------------+ +| countset(7, 1) | ++----------------+ +| 3 | ++----------------+ + +select countset(7,0); /* 00000111 = 5 0 bits; third argument can only be 0 or 1 */ ++----------------+ +| countset(7, 0) | ++----------------+ +| 5 | ++----------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__getbit"> + <code class="ph codeph">getbit(integer_type a, int position)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Returns a 0 or 1 representing the bit at a + specified position. The positions are numbered right to left, starting at zero. + The position argument cannot be negative. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + When you use a literal input value, it is treated as an 8-bit, 16-bit, + and so on value, the smallest type that is appropriate. + The type of the input value limits the range of the positions. + Cast the input value to the appropriate type if you need to + ensure it is treated as a 64-bit, 32-bit, and so on value. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + <p class="p"> + The following examples show how to test a specific bit within an integer value. + </p> +<pre class="pre codeblock"><code>select getbit(1,0); /* 00000001 */ ++--------------+ +| getbit(1, 0) | ++--------------+ +| 1 | ++--------------+ + +select getbit(16,1) /* 00010000 */ ++---------------+ +| getbit(16, 1) | ++---------------+ +| 0 | ++---------------+ + +select getbit(16,4) /* 00010000 */ ++---------------+ +| getbit(16, 4) | ++---------------+ +| 1 | ++---------------+ + +select getbit(16,5) /* 00010000 */ ++---------------+ +| getbit(16, 5) | ++---------------+ +| 0 | ++---------------+ + +select getbit(-1,3); /* 11111111 */ ++---------------+ +| getbit(-1, 3) | ++---------------+ +| 1 | ++---------------+ + +select getbit(-1,25); /* 11111111 */ +ERROR: Invalid bit position: 25 + +select getbit(cast(-1 as int),25); /* 11111111111111111111111111111111 */ ++-----------------------------+ +| getbit(cast(-1 as int), 25) | ++-----------------------------+ +| 1 | ++-----------------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__rotateleft"> + <code class="ph codeph">rotateleft(integer_type a, int positions)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Rotates an integer value left by a specified number of bits. + As the most significant bit is taken out of the original value, + if it is a 1 bit, it is <span class="q">"rotated"</span> back to the least significant bit. + Therefore, the final value has the same number of 1 bits as the original value, + just in different positions. + In computer science terms, this operation is a + <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Specifying a second argument of zero leaves the original value unchanged. + Rotating a -1 value by any number of positions still returns -1, + because the original value has all 1 bits and all the 1 bits are + preserved during rotation. + Similarly, rotating a 0 value by any number of positions still returns 0. + Rotating a value by the same number of bits as in the value returns the same value. + Because this is a circular operation, the number of positions is not limited + to the number of bits in the input value. + For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an + identical result in each case. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select rotateleft(1,4); /* 00000001 -> 00010000 */ ++------------------+ +| rotateleft(1, 4) | ++------------------+ +| 16 | ++------------------+ + +select rotateleft(-1,155); /* 11111111 -> 11111111 */ ++---------------------+ +| rotateleft(-1, 155) | ++---------------------+ +| -1 | ++---------------------+ + +select rotateleft(-128,1); /* 10000000 -> 00000001 */ ++---------------------+ +| rotateleft(-128, 1) | ++---------------------+ +| 1 | ++---------------------+ + +select rotateleft(-127,3); /* 10000001 -> 00001100 */ ++---------------------+ +| rotateleft(-127, 3) | ++---------------------+ +| 12 | ++---------------------+ + +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__rotateright"> + <code class="ph codeph">rotateright(integer_type a, int positions)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Rotates an integer value right by a specified number of bits. + As the least significant bit is taken out of the original value, + if it is a 1 bit, it is <span class="q">"rotated"</span> back to the most significant bit. + Therefore, the final value has the same number of 1 bits as the original value, + just in different positions. + In computer science terms, this operation is a + <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Specifying a second argument of zero leaves the original value unchanged. + Rotating a -1 value by any number of positions still returns -1, + because the original value has all 1 bits and all the 1 bits are + preserved during rotation. + Similarly, rotating a 0 value by any number of positions still returns 0. + Rotating a value by the same number of bits as in the value returns the same value. + Because this is a circular operation, the number of positions is not limited + to the number of bits in the input value. + For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an + identical result in each case. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select rotateright(16,4); /* 00010000 -> 00000001 */ ++--------------------+ +| rotateright(16, 4) | ++--------------------+ +| 1 | ++--------------------+ + +select rotateright(-1,155); /* 11111111 -> 11111111 */ ++----------------------+ +| rotateright(-1, 155) | ++----------------------+ +| -1 | ++----------------------+ + +select rotateright(-128,1); /* 10000000 -> 01000000 */ ++----------------------+ +| rotateright(-128, 1) | ++----------------------+ +| 64 | ++----------------------+ + +select rotateright(-127,3); /* 10000001 -> 00110000 */ ++----------------------+ +| rotateright(-127, 3) | ++----------------------+ +| 48 | ++----------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__setbit"> + <code class="ph codeph">setbit(integer_type a, int position [, int zero_or_one])</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> By default, changes a bit at a specified position to a 1, if it is not already. + If the optional third argument is set to zero, the specified bit is set to 0 instead. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + If the bit at the specified position was already 1 (by default) + or 0 (with a third argument of zero), the return value is + the same as the first argument. + The positions are numbered right to left, starting at zero. + (Therefore, the return value could be different from the first argument + even if the position argument is zero.) + The position argument cannot be negative. + <p class="p"> + When you use a literal input value, it is treated as an 8-bit, 16-bit, + and so on value, the smallest type that is appropriate. + The type of the input value limits the range of the positions. + Cast the input value to the appropriate type if you need to + ensure it is treated as a 64-bit, 32-bit, and so on value. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select setbit(0,0); /* 00000000 -> 00000001 */ ++--------------+ +| setbit(0, 0) | ++--------------+ +| 1 | ++--------------+ + +select setbit(0,3); /* 00000000 -> 00001000 */ ++--------------+ +| setbit(0, 3) | ++--------------+ +| 8 | ++--------------+ + +select setbit(7,3); /* 00000111 -> 00001111 */ ++--------------+ +| setbit(7, 3) | ++--------------+ +| 15 | ++--------------+ + +select setbit(15,3); /* 00001111 -> 00001111 */ ++---------------+ +| setbit(15, 3) | ++---------------+ +| 15 | ++---------------+ + +select setbit(0,32); /* By default, 0 is a TINYINT with only 8 bits. */ +ERROR: Invalid bit position: 32 + +select setbit(cast(0 as bigint),32); /* For BIGINT, the position can be 0..63. */ ++-------------------------------+ +| setbit(cast(0 as bigint), 32) | ++-------------------------------+ +| 4294967296 | ++-------------------------------+ + +select setbit(7,3,1); /* 00000111 -> 00001111; setting to 1 is the default */ ++-----------------+ +| setbit(7, 3, 1) | ++-----------------+ +| 15 | ++-----------------+ + +select setbit(7,2,0); /* 00000111 -> 00000011; third argument of 0 clears instead of sets */ ++-----------------+ +| setbit(7, 2, 0) | ++-----------------+ +| 3 | ++-----------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__shiftleft"> + <code class="ph codeph">shiftleft(integer_type a, int positions)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Shifts an integer value left by a specified number of bits. + As the most significant bit is taken out of the original value, + it is discarded and the least significant bit becomes 0. + In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + The final value has either the same number of 1 bits as the original value, or fewer. + Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces + a result of zero. + </p> + <p class="p"> + Specifying a second argument of zero leaves the original value unchanged. + Shifting any value by 0 returns the original value. + Shifting any value by 1 is the same as multiplying it by 2, + as long as the value is small enough; larger values eventually + become negative when shifted, as the sign bit is set. + Starting with the value 1 and shifting it left by N positions gives + the same result as 2 to the Nth power, or <code class="ph codeph">pow(2,<var class="keyword varname">N</var>)</code>. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select shiftleft(1,0); /* 00000001 -> 00000001 */ ++-----------------+ +| shiftleft(1, 0) | ++-----------------+ +| 1 | ++-----------------+ + +select shiftleft(1,3); /* 00000001 -> 00001000 */ ++-----------------+ +| shiftleft(1, 3) | ++-----------------+ +| 8 | ++-----------------+ + +select shiftleft(8,2); /* 00001000 -> 00100000 */ ++-----------------+ +| shiftleft(8, 2) | ++-----------------+ +| 32 | ++-----------------+ + +select shiftleft(127,1); /* 01111111 -> 11111110 */ ++-------------------+ +| shiftleft(127, 1) | ++-------------------+ +| -2 | ++-------------------+ + +select shiftleft(127,5); /* 01111111 -> 11100000 */ ++-------------------+ +| shiftleft(127, 5) | ++-------------------+ +| -32 | ++-------------------+ + +select shiftleft(-1,4); /* 11111111 -> 11110000 */ ++------------------+ +| shiftleft(-1, 4) | ++------------------+ +| -16 | ++------------------+ +</code></pre> + </dd> + + + + + + <dt class="dt dlterm" id="bit_functions__shiftright"> + <code class="ph codeph">shiftright(integer_type a, int positions)</code> + </dt> + + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Shifts an integer value right by a specified number of bits. + As the least significant bit is taken out of the original value, + it is discarded and the most significant bit becomes 0. + In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>. + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + <p class="p"> + Therefore, the final value has either the same number of 1 bits as the original value, or fewer. + Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces + a result of zero. + </p> + <p class="p"> + Specifying a second argument of zero leaves the original value unchanged. + Shifting any value by 0 returns the original value. + Shifting any positive value right by 1 is the same as dividing it by 2. + Negative values become positive when shifted right. + </p> + <p class="p"> + <strong class="ph b">Return type:</strong> Same as the input value + </p> + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select shiftright(16,0); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 0) | ++-------------------+ +| 16 | ++-------------------+ + +select shiftright(16,4); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 4) | ++-------------------+ +| 1 | ++-------------------+ + +select shiftright(16,5); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 5) | ++-------------------+ +| 0 | ++-------------------+ + +select shiftright(-1,1); /* 11111111 -> 01111111 */ ++-------------------+ +| shiftright(-1, 1) | ++-------------------+ +| 127 | ++-------------------+ + +select shiftright(-1,5); /* 11111111 -> 00000111 */ ++-------------------+ +| shiftright(-1, 5) | ++-------------------+ +| 7 | ++-------------------+ +</code></pre> + </dd> + + + + </dl> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_boolean.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_boolean.html b/docs/build/html/topics/impala_boolean.html new file mode 100644 index 0000000..51a91ba --- /dev/null +++ b/docs/build/html/topics/impala_boolean.html @@ -0,0 +1,170 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="boolean"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BOOLEAN Data Type</title></head><body id="boolean"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">BOOLEAN Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a + single true/false choice. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + + <p class="p"> + In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement: + </p> + +<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BOOLEAN</code></pre> + + <p class="p"> + <strong class="ph b">Range:</strong> <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>. Do not use quotation marks around the + <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code> literal values. You can write the literal values in + uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase, + <code class="ph codeph">true</code> or <code class="ph codeph">false</code>. + </p> + + <p class="p"> + <strong class="ph b">Conversions:</strong> Impala does not automatically convert any other type to <code class="ph codeph">BOOLEAN</code>. All + conversions must use an explicit call to the <code class="ph codeph">CAST()</code> function. + </p> + + <p class="p"> + You can use <code class="ph codeph">CAST()</code> to convert + + any integer or floating-point type to + <code class="ph codeph">BOOLEAN</code>: a value of 0 represents <code class="ph codeph">false</code>, and any non-zero value is converted + to <code class="ph codeph">true</code>. + </p> + +<pre class="pre codeblock"><code>SELECT CAST(42 AS BOOLEAN) AS nonzero_int, CAST(99.44 AS BOOLEAN) AS nonzero_decimal, + CAST(000 AS BOOLEAN) AS zero_int, CAST(0.0 AS BOOLEAN) AS zero_decimal; ++-------------+-----------------+----------+--------------+ +| nonzero_int | nonzero_decimal | zero_int | zero_decimal | ++-------------+-----------------+----------+--------------+ +| true | true | false | false | ++-------------+-----------------+----------+--------------+ +</code></pre> + + <p class="p"> + When you cast the opposite way, from <code class="ph codeph">BOOLEAN</code> to a numeric type, + the result becomes either 1 or 0: + </p> + +<pre class="pre codeblock"><code>SELECT CAST(true AS INT) AS true_int, CAST(true AS DOUBLE) AS true_double, + CAST(false AS INT) AS false_int, CAST(false AS DOUBLE) AS false_double; ++----------+-------------+-----------+--------------+ +| true_int | true_double | false_int | false_double | ++----------+-------------+-----------+--------------+ +| 1 | 1 | 0 | 0 | ++----------+-------------+-----------+--------------+ +</code></pre> + + <p class="p"> + + You can cast <code class="ph codeph">DECIMAL</code> values to <code class="ph codeph">BOOLEAN</code>, with the same treatment of zero and + non-zero values as the other numeric types. You cannot cast a <code class="ph codeph">BOOLEAN</code> to a + <code class="ph codeph">DECIMAL</code>. + </p> + + <p class="p"> + You cannot cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>, although you can cast a + <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code> for + <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values. + </p> + + <p class="p"> + Although you can cast a <code class="ph codeph">TIMESTAMP</code> to a <code class="ph codeph">BOOLEAN</code> or a + <code class="ph codeph">BOOLEAN</code> to a <code class="ph codeph">TIMESTAMP</code>, the results are unlikely to be useful. Any non-zero + <code class="ph codeph">TIMESTAMP</code> (that is, any value other than <code class="ph codeph">1970-01-01 00:00:00</code>) becomes + <code class="ph codeph">TRUE</code> when converted to <code class="ph codeph">BOOLEAN</code>, while <code class="ph codeph">1970-01-01 00:00:00</code> + becomes <code class="ph codeph">FALSE</code>. A value of <code class="ph codeph">FALSE</code> becomes <code class="ph codeph">1970-01-01 + 00:00:00</code> when converted to <code class="ph codeph">BOOLEAN</code>, and <code class="ph codeph">TRUE</code> becomes one second + past this epoch date, that is, <code class="ph codeph">1970-01-01 00:00:01</code>. + </p> + + <p class="p"> + <strong class="ph b">NULL considerations:</strong> An expression of this type produces a <code class="ph codeph">NULL</code> value if any + argument of the expression is <code class="ph codeph">NULL</code>. + </p> + + <p class="p"> + <strong class="ph b">Partitioning:</strong> + </p> + + <p class="p"> + Do not use a <code class="ph codeph">BOOLEAN</code> column as a partition key. Although you can create such a table, + subsequent operations produce errors: + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > create table truth_table (assertion string) partitioned by (truth boolean); +[localhost:21000] > insert into truth_table values ('Pigs can fly',false); +ERROR: AnalysisException: INSERT into table with BOOLEAN partition column (truth) is not supported: partitioning.truth_table +</code></pre> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>SELECT 1 < 2; +SELECT 2 = 5; +SELECT 100 < NULL, 100 > NULL; +CREATE TABLE assertions (claim STRING, really BOOLEAN); +INSERT INTO assertions VALUES + ("1 is less than 2", 1 < 2), + ("2 is the same as 5", 2 = 5), + ("Grass is green", true), + ("The moon is made of green cheese", false); +SELECT claim FROM assertions WHERE really = TRUE; +</code></pre> + + <p class="p"> + <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables. + </p> + + <p class="p"> + <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables. + </p> + + <p class="p"> + <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables + using Parquet or other binary formats. + </p> + + + + + + + + <p class="p"> + <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size + fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code> + statement. + </p> + + + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + <p class="p"> + Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>, + and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables. + </p> + + + + <p class="p"> + <strong class="ph b">Related information:</strong> <a class="xref" href="impala_literals.html#boolean_literals">Boolean Literals</a>, + <a class="xref" href="impala_operators.html#operators">SQL Operators</a>, + <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file
