http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_array.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_array.xml b/docs/topics/impala_array.xml new file mode 100644 index 0000000..1e60795 --- /dev/null +++ b/docs/topics/impala_array.xml @@ -0,0 +1,266 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="array"> + + <title>ARRAY Complex Type (CDH 5.5 or higher only)</title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + </metadata> + </prolog> + + <conbody> + + <p> + A complex data type that can represent an arbitrary number of ordered elements. + The elements can be scalars or another complex type (<codeph>ARRAY</codeph>, + <codeph>STRUCT</codeph>, or <codeph>MAP</codeph>). + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<!-- To do: make sure there is sufficient syntax info under the SELECT statement to understand how to query all the complex types. --> + +<codeblock><varname>column_name</varname> ARRAY < <varname>type</varname> > + +type ::= <varname>primitive_type</varname> | <varname>complex_type</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_combo"/> + + <p> + The elements of the array have no names. You refer to the value of the array item using the + <codeph>ITEM</codeph> pseudocolumn, or its position in the array with the <codeph>POS</codeph> + pseudocolumn. See <xref href="impala_complex_types.xml#item"/> for information about + these pseudocolumns. + </p> + +<!-- Array is a frequently used idiom; don't recommend MAP right up front, since that is more rarely used. STRUCT has all different considerations. + <p> + If it would be logical to have a fixed number of elements and give each one a name, consider using a + <codeph>MAP</codeph> (when all elements are of the same type) or a <codeph>STRUCT</codeph> (if different + elements have different types) instead of an <codeph>ARRAY</codeph>. + </p> +--> + + <p> + Each row can have a different number of elements (including none) in the array for that row. + </p> + +<!-- Since you don't use numeric indexes, this assertion and advice doesn't make sense. + <p> + If you attempt to refer to a non-existent array element, the result is <codeph>NULL</codeph>. Therefore, + when using operations such as addition or string concatenation involving array elements, you might use + conditional functions to substitute default values such as 0 or <codeph>""</codeph> in the place of missing + array elements. + </p> +--> + + <p> + When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For + example, you can find the <codeph>COUNT()</codeph>, <codeph>AVG()</codeph>, <codeph>SUM()</codeph>, and so on of numeric array + elements, or the <codeph>MAX()</codeph> and <codeph>MIN()</codeph> of any scalar array elements by referring to + <codeph><varname>table_name</varname>.<varname>array_column</varname></codeph> in the <codeph>FROM</codeph> clause of the query. When + you need to cross-reference values from the array with scalar values from the same row, such as by including a <codeph>GROUP + BY</codeph> clause to produce a separate aggregated result for each row, then the join clause is required. + </p> + + <p> + A common usage pattern with complex types is to have an array as the top-level type for the column: + an array of structs, an array of maps, or an array of arrays. + For example, you can model a denormalized table by creating a column that is an <codeph>ARRAY</codeph> + of <codeph>STRUCT</codeph> elements; each item in the array represents a row from a table that would + normally be used in a join query. This kind of data structure lets you essentially denormalize tables by + associating multiple rows from one table with the matching row in another table. + </p> + + <p> + You typically do not create more than one top-level <codeph>ARRAY</codeph> column, because if there is + some relationship between the elements of multiple arrays, it is convenient to model the data as + an array of another complex type element (either <codeph>STRUCT</codeph> or <codeph>MAP</codeph>). + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_describe"/> + + <p conref="../shared/impala_common.xml#common/added_in_230"/> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <ul conref="../shared/impala_common.xml#common/complex_types_restrictions"> + <li/> + </ul> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/> + + <p> + The following example shows how to construct a table with various kinds of <codeph>ARRAY</codeph> columns, + both at the top level and nested within other complex types. + Whenever the <codeph>ARRAY</codeph> consists of a scalar value, such as in the <codeph>PETS</codeph> + column or the <codeph>CHILDREN</codeph> field, you can see that future expansion is limited. + For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name. + Therefore, it is more common to use an <codeph>ARRAY</codeph> whose elements are of <codeph>STRUCT</codeph> type, + to associate multiple fields with each array element. + </p> + + <note> + Practice the <codeph>CREATE TABLE</codeph> and query notation for complex type columns + using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably. + </note> + +<!-- To do: verify and flesh out this example. --> + +<codeblock><![CDATA[CREATE TABLE array_demo +( + id BIGINT, + name STRING, +-- An ARRAY of scalar type as a top-level column. + pets ARRAY <STRING>, + +-- An ARRAY with elements of complex type (STRUCT). + places_lived ARRAY < STRUCT < + place: STRING, + start_year: INT + >>, + +-- An ARRAY as a field (CHILDREN) within a STRUCT. +-- (The STRUCT is inside another ARRAY, because it is rare +-- for a STRUCT to be a top-level column.) + marriages ARRAY < STRUCT < + spouse: STRING, + children: ARRAY <STRING> + >>, + +-- An ARRAY as the value part of a MAP. +-- The first MAP field (the key) would be a value such as +-- 'Parent' or 'Grandparent', and the corresponding array would +-- represent 2 parents, 4 grandparents, and so on. + ancestors MAP < STRING, ARRAY <STRING> > +) +STORED AS PARQUET; +]]> +</codeblock> + + <p> + The following example shows how to examine the structure of a table containing one or more <codeph>ARRAY</codeph> columns by using the + <codeph>DESCRIBE</codeph> statement. You can visualize each <codeph>ARRAY</codeph> as its own two-column table, with columns + <codeph>ITEM</codeph> and <codeph>POS</codeph>. + </p> + +<!-- To do: extend the examples to include MARRIAGES and ANCESTORS columns, or get rid of those columns. --> + +<codeblock><![CDATA[DESCRIBE array_demo; ++--------------+---------------------------+ +| name | type | ++--------------+---------------------------+ +| id | bigint | +| name | string | +| pets | array<string> | +| marriages | array<struct< | +| | spouse:string, | +| | children:array<string> | +| | >> | +| places_lived | array<struct< | +| | place:string, | +| | start_year:int | +| | >> | +| ancestors | map<string,array<string>> | ++--------------+---------------------------+ + +DESCRIBE array_demo.pets; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE array_demo.marriages; ++------+--------------------------+ +| name | type | ++------+--------------------------+ +| item | struct< | +| | spouse:string, | +| | children:array<string> | +| | > | +| pos | bigint | ++------+--------------------------+ + +DESCRIBE array_demo.places_lived; ++------+------------------+ +| name | type | ++------+------------------+ +| item | struct< | +| | place:string, | +| | start_year:int | +| | > | +| pos | bigint | ++------+------------------+ + +DESCRIBE array_demo.ancestors; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ +]]> +</codeblock> + + <p> + The following example shows queries involving <codeph>ARRAY</codeph> columns containing elements of scalar or complex types. You + <q>unpack</q> each <codeph>ARRAY</codeph> column by referring to it in a join query, as if it were a separate table with + <codeph>ITEM</codeph> and <codeph>POS</codeph> columns. If the array element is a scalar type, you refer to its value using the + <codeph>ITEM</codeph> pseudocolumn. If the array element is a <codeph>STRUCT</codeph>, you refer to the <codeph>STRUCT</codeph> fields + using dot notation and the field names. If the array element is another <codeph>ARRAY</codeph> or a <codeph>MAP</codeph>, you use + another level of join to unpack the nested collection elements. + </p> + +<!-- To do: have some sample output to show for these queries. --> + +<codeblock><![CDATA[-- Array of scalar values. +-- Each array element represents a single string, plus we know its position in the array. +SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets; + +-- Array of structs. +-- Now each array element has named fields, possibly of different types. +-- You can consider an ARRAY of STRUCT to represent a table inside another table. +SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year +FROM array_demo, array_demo.places_lived; + +-- The .ITEM name is optional for array elements that are structs. +-- The following query is equivalent to the previous one, with .ITEM +-- removed from the column references. +SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year + FROM array_demo, array_demo.places_lived; + +-- To filter specific items from the array, do comparisons against the .POS or .ITEM +-- pseudocolumns, or names of struct fields, in the WHERE clause. +SELECT id, name, pets.item FROM array_demo, array_demo.pets + WHERE pets.pos in (0, 1, 3); + +SELECT id, name, pets.item FROM array_demo, array_demo.pets + WHERE pets.item LIKE 'Mr. %'; + +SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year + FROM array_demo, array_demo.places_lived +WHERE places_lived.place like '%California%'; +]]> +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_complex_types.xml#complex_types"/>, +<!-- <xref href="impala_array.xml#array"/>, --> + <xref href="impala_struct.xml#struct"/>, <xref href="impala_map.xml#map"/> + </p> + + </conbody> + +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_avg.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_avg.xml b/docs/topics/impala_avg.xml new file mode 100644 index 0000000..26f5450 --- /dev/null +++ b/docs/topics/impala_avg.xml @@ -0,0 +1,223 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="avg"> + + <title>AVG Function</title> + <titlealts><navtitle>AVG</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="Analytic Functions"/> + <data name="Category" value="Aggregate Functions"/> + <data name="Category" value="Querying"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">avg() function</indexterm> + An aggregate function that returns the average value from a set of numbers or <codeph>TIMESTAMP</codeph> values. + Its single argument can be numeric column, or the numeric result of a function or expression applied to the + column value. Rows with a <codeph>NULL</codeph> value for the specified column are ignored. If the table is empty, + or all the values supplied to <codeph>AVG</codeph> are <codeph>NULL</codeph>, <codeph>AVG</codeph> returns + <codeph>NULL</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock>AVG([DISTINCT | ALL] <varname>expression</varname>) [OVER (<varname>analytic_clause</varname>)] +</codeblock> + + <p> + When the query contains a <codeph>GROUP BY</codeph> clause, returns one value for each combination of + grouping values. + </p> + + <p> + <b>Return type:</b> <codeph>DOUBLE</codeph> for numeric values; <codeph>TIMESTAMP</codeph> for + <codeph>TIMESTAMP</codeph> values + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- Average all the non-NULL values in a column. +insert overwrite avg_t values (2),(4),(6),(null),(null); +-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are ignored. +select avg(x) from avg_t; +-- Average only certain values from the column. +select avg(x) from t1 where month = 'January' and year = '2013'; +-- Apply a calculation to the value of the column before averaging. +select avg(x/3) from t1; +-- Apply a function to the value of the column before averaging. +-- Here we are substituting a value of 0 for all NULLs in the column, +-- so that those rows do factor into the return value. +select avg(isnull(x,0)) from t1; +-- Apply some number-returning function to a string column and average the results. +-- If column s contains any NULLs, length(s) also returns NULL and those rows are ignored. +select avg(length(s)) from t1; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Return more than one result. +select month, year, avg(page_visits) from web_stats group by month, year; +-- Filter the input to eliminate duplicates before performing the calculation. +select avg(distinct x) from t1; +-- Filter the output after performing the calculation. +select avg(x) from t1 group by y having avg(x) between 1 and 20; +</codeblock> + + <p rev="2.0.0"> + The following examples show how to use <codeph>AVG()</codeph> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <codeph>AVG()</codeph> is reported for each input value, as + opposed to the <codeph>GROUP BY</codeph> clause which condenses the result set. +<codeblock>select x, property, avg(x) over (partition by property) as avg from int_t where property in ('odd','even'); ++----+----------+-----+ +| x | property | avg | ++----+----------+-----+ +| 2 | even | 6 | +| 4 | even | 6 | +| 6 | even | 6 | +| 8 | even | 6 | +| 10 | even | 6 | +| 1 | odd | 5 | +| 3 | odd | 5 | +| 5 | odd | 5 | +| 7 | odd | 5 | +| 9 | odd | 5 | ++----+----------+-----+ +</codeblock> + +Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <q>window</q>). The following examples use <codeph>AVG()</codeph> in an analytic context +(that is, with an <codeph>OVER()</codeph> clause) to produce a running average of all the even values, +then a running average of all the odd values. The basic <codeph>ORDER BY x</codeph> clause implicitly +activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +therefore all of these examples produce the same results: +<codeblock>select x, property, + avg(x) over (partition by property <b>order by x</b>) as 'cumulative average' + from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ + +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>range between unbounded preceding and current row</b> + ) as 'cumulative average' +from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ + +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>rows between unbounded preceding and current row</b> + ) as 'cumulative average' + from int_t where property in ('odd','even'); ++----+----------+--------------------+ +| x | property | cumulative average | ++----+----------+--------------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 4 | +| 8 | even | 5 | +| 10 | even | 6 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+--------------------+ +</codeblock> + +The following examples show how to construct a moving window, with a running average taking into account 1 row before +and 1 row after the current row, within the same partition (all the even values or all the odd values). +Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this type of +moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but not the <codeph>RANGE BETWEEN</codeph> +clause: +<codeblock>select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>rows between 1 preceding and 1 following</b> + ) as 'moving average' + from int_t where property in ('odd','even'); ++----+----------+----------------+ +| x | property | moving average | ++----+----------+----------------+ +| 2 | even | 3 | +| 4 | even | 4 | +| 6 | even | 6 | +| 8 | even | 8 | +| 10 | even | 9 | +| 1 | odd | 2 | +| 3 | odd | 3 | +| 5 | odd | 5 | +| 7 | odd | 7 | +| 9 | odd | 8 | ++----+----------+----------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + avg(x) over + ( + partition by property + <b>order by x</b> + <b>range between 1 preceding and 1 following</b> + ) as 'moving average' +from int_t where property in ('odd','even'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</codeblock> + </p> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + +<!-- This conref appears under SUM(), AVG(), FLOAT, and DOUBLE topics. --> + + <p conref="../shared/impala_common.xml#common/sum_double"/> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_analytic_functions.xml#analytic_functions"/>, <xref href="impala_max.xml#max"/>, + <xref href="impala_min.xml#min"/> + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_batch_size.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_batch_size.xml b/docs/topics/impala_batch_size.xml new file mode 100644 index 0000000..13a4b18 --- /dev/null +++ b/docs/topics/impala_batch_size.xml @@ -0,0 +1,33 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="batch_size"> + + <title>BATCH_SIZE Query Option</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Query Options"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">BATCH_SIZE query option</indexterm> + Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default + size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint. + </p> + + <p> + This option is primarily for Cloudera testing, or for use under the direction of Cloudera Support. + </p> + + <p> + <b>Type:</b> numeric + </p> + + <p> + <b>Default:</b> 0 (meaning the predefined default of 1024) + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_bigint.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_bigint.xml b/docs/topics/impala_bigint.xml new file mode 100644 index 0000000..8f31bc6 --- /dev/null +++ b/docs/topics/impala_bigint.xml @@ -0,0 +1,100 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="bigint"> + + <title>BIGINT Data Type</title> + <titlealts><navtitle>BIGINT</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Schemas"/> + </metadata> + </prolog> + + <conbody> + + <p> + An 8-byte integer data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> + statements. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + + <p> + In the column definition of a <codeph>CREATE TABLE</codeph> statement: + </p> + +<codeblock><varname>column_name</varname> BIGINT</codeblock> + + <p> + <b>Range:</b> -9223372036854775808 .. 9223372036854775807. There is no <codeph>UNSIGNED</codeph> subtype. + </p> + + <p> + <b>Conversions:</b> Impala automatically converts to a floating-point type (<codeph>FLOAT</codeph> or + <codeph>DOUBLE</codeph>) automatically. Use <codeph>CAST()</codeph> to convert to <codeph>TINYINT</codeph>, + <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, <codeph>STRING</codeph>, or <codeph>TIMESTAMP</codeph>. + <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/> + </p> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>CREATE TABLE t1 (x BIGINT); +SELECT CAST(1000 AS BIGINT); +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p> + <codeph>BIGINT</codeph> is a convenient type to use for column declarations because you can use any kind of + integer values in <codeph>INSERT</codeph> statements and they are promoted to <codeph>BIGINT</codeph> where + necessary. However, <codeph>BIGINT</codeph> also requires the most bytes of any integer type on disk and in + memory, meaning your queries are not as efficient and scalable as possible if you overuse this type. + Therefore, prefer to use the smallest integer type with sufficient range to hold all input values, and + <codeph>CAST()</codeph> when necessary to the appropriate type. + </p> + + <p> + For a convenient and automated way to check the bounds of the <codeph>BIGINT</codeph> type, call the + functions <codeph>MIN_BIGINT()</codeph> and <codeph>MAX_BIGINT()</codeph>. + </p> + + <p> + If an integer value is too large to be represented as a <codeph>BIGINT</codeph>, use a + <codeph>DECIMAL</codeph> instead with sufficient digits of precision. + </p> + + <p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/> + + <p conref="../shared/impala_common.xml#common/partitioning_good"/> + + <p conref="../shared/impala_common.xml#common/hbase_ok"/> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/parquet_blurb"/> --> + + <p conref="../shared/impala_common.xml#common/text_bulky"/> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> --> + + <p conref="../shared/impala_common.xml#common/internals_8_bytes"/> + + <p conref="../shared/impala_common.xml#common/added_forever"/> + + <p conref="../shared/impala_common.xml#common/column_stats_constant"/> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/restrictions_blurb"/> --> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_literals.xml#numeric_literals"/>, <xref href="impala_tinyint.xml#tinyint"/>, + <xref href="impala_smallint.xml#smallint"/>, <xref href="impala_int.xml#int"/>, + <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_decimal.xml#decimal"/>, + <xref href="impala_math_functions.xml#math_functions"/> + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_bit_functions.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_bit_functions.xml b/docs/topics/impala_bit_functions.xml new file mode 100644 index 0000000..77c7e5d --- /dev/null +++ b/docs/topics/impala_bit_functions.xml @@ -0,0 +1,798 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="bit_functions" rev="2.3.0"> + + <title>Impala Bit Functions</title> + <titlealts><navtitle>Bit Functions</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Querying"/> + </metadata> + </prolog> + + <conbody> + + <p> + Bit manipulation functions perform bitwise operations involved in scientific processing or computer science algorithms. + For example, these functions include setting, clearing, or testing bits within an integer value, or changing the + positions of bits with or without wraparound. + </p> + + <p> + If a function takes two integer arguments that are required to be of the same type, the smaller argument is promoted + to the type of the larger one if required. For example, <codeph>BITAND(1,4096)</codeph> treats both arguments as + <codeph>SMALLINT</codeph>, because 1 can be represented as a <codeph>TINYINT</codeph> but 4096 requires a <codeph>SMALLINT</codeph>. + </p> + + <p> + Remember that all Impala integer values are signed. Therefore, when dealing with binary values where the most significant + bit is 1, the specified or returned values might be negative when represented in base 10. + </p> + + <p> + Whenever any argument is <codeph>NULL</codeph>, either the input value, bit position, or number of shift or rotate positions, + the return value from any of these functions is also <codeph>NULL</codeph> + </p> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + The bit functions operate on all the integral data types: <xref href="impala_int.xml#int"/>, + <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_smallint.xml#smallint"/>, and + <xref href="impala_tinyint.xml#tinyint"/>. + </p> + + <p> + <b>Function reference:</b> + </p> + + <p> + Impala supports the following bit functions: + </p> + +<!-- +bitand +bitnot +bitor +bitxor +countset +getbit +rotateleft +rotateright +setbit +shiftleft +shiftright +--> + +<!-- Include this conref for all the bit functions, all newly added in Impala 2.3.0. + <p conref="../shared/impala_common.xml#common/added_in_230"/> +--> + + <dl> + + <dlentry id="bitand"> + + <dt> + <codeph>bitand(integer_type a, same_type b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bitand() function</indexterm> + <b>Purpose:</b> Returns an integer value representing the bits that are set to 1 in both of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p> + <b>Usage notes:</b> The <codeph>bitand()</codeph> function is equivalent to the <codeph>&</codeph> binary operator. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show the results of ANDing integer values. + 255 contains all 1 bits in its lowermost 7 bits. + 32767 contains all 1 bits in its lowermost 15 bits. + <!-- + Negative numbers have a 1 in the sign bit and the value is the + <xref href="https://en.wikipedia.org/wiki/Two%27s_complement" scope="external" format="html">two's complement</xref> + of the positive equivalent. + --> + You can use the <codeph>bin()</codeph> function to check the binary representation of any + integer value, although the result is always represented as a 64-bit value. + If necessary, the smaller argument is promoted to the + type of the larger one. + </p> +<codeblock>select bitand(255, 32767); /* 0000000011111111 & 0111111111111111 */ ++--------------------+ +| bitand(255, 32767) | ++--------------------+ +| 255 | ++--------------------+ + +select bitand(32767, 1); /* 0111111111111111 & 0000000000000001 */ ++------------------+ +| bitand(32767, 1) | ++------------------+ +| 1 | ++------------------+ + +select bitand(32, 16); /* 00010000 & 00001000 */ ++----------------+ +| bitand(32, 16) | ++----------------+ +| 0 | ++----------------+ + +select bitand(12,5); /* 00001100 & 00000101 */ ++---------------+ +| bitand(12, 5) | ++---------------+ +| 4 | ++---------------+ + +select bitand(-1,15); /* 11111111 & 00001111 */ ++----------------+ +| bitand(-1, 15) | ++----------------+ +| 15 | ++----------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="bitnot"> + + <dt> + <codeph>bitnot(integer_type a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bitnot() function</indexterm> + <b>Purpose:</b> Inverts all the bits of the input argument. + <p> + <b>Usage notes:</b> The <codeph>bitnot()</codeph> function is equivalent to the <codeph>~</codeph> unary operator. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + These examples illustrate what happens when you flip all the bits of an integer value. + The sign always changes. The decimal representation is one different between the positive and + negative values. + <!-- + because negative values are represented as the + <xref href="https://en.wikipedia.org/wiki/Two%27s_complement" scope="external" format="html">two's complement</xref> + of the corresponding positive value. + --> + </p> +<codeblock>select bitnot(127); /* 01111111 -> 10000000 */ ++-------------+ +| bitnot(127) | ++-------------+ +| -128 | ++-------------+ + +select bitnot(16); /* 00010000 -> 11101111 */ ++------------+ +| bitnot(16) | ++------------+ +| -17 | ++------------+ + +select bitnot(0); /* 00000000 -> 11111111 */ ++-----------+ +| bitnot(0) | ++-----------+ +| -1 | ++-----------+ + +select bitnot(-128); /* 10000000 -> 01111111 */ ++--------------+ +| bitnot(-128) | ++--------------+ +| 127 | ++--------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="bitor"> + + <dt> + <codeph>bitor(integer_type a, same_type b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bitor() function</indexterm> + <b>Purpose:</b> Returns an integer value representing the bits that are set to 1 in either of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p> + <b>Usage notes:</b> The <codeph>bitor()</codeph> function is equivalent to the <codeph>|</codeph> binary operator. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show the results of ORing integer values. + </p> +<codeblock>select bitor(1,4); /* 00000001 | 00000100 */ ++-------------+ +| bitor(1, 4) | ++-------------+ +| 5 | ++-------------+ + +select bitor(16,48); /* 00001000 | 00011000 */ ++---------------+ +| bitor(16, 48) | ++---------------+ +| 48 | ++---------------+ + +select bitor(0,7); /* 00000000 | 00000111 */ ++-------------+ +| bitor(0, 7) | ++-------------+ +| 7 | ++-------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="bitxor"> + + <dt> + <codeph>bitxor(integer_type a, same_type b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bitxor() function</indexterm> + <b>Purpose:</b> Returns an integer value representing the bits that are set to 1 in one but not both of the arguments. + If the arguments are of different sizes, the smaller is promoted to the type of the larger. + <p> + <b>Usage notes:</b> The <codeph>bitxor()</codeph> function is equivalent to the <codeph>^</codeph> binary operator. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show the results of XORing integer values. + XORing a non-zero value with zero returns the non-zero value. + XORing two identical values returns zero, because all the 1 bits from the first argument are also 1 bits in the second argument. + XORing different non-zero values turns off some bits and leaves others turned on, based on whether the same bit is set in both arguments. + </p> +<codeblock>select bitxor(0,15); /* 00000000 ^ 00001111 */ ++---------------+ +| bitxor(0, 15) | ++---------------+ +| 15 | ++---------------+ + +select bitxor(7,7); /* 00000111 ^ 00000111 */ ++--------------+ +| bitxor(7, 7) | ++--------------+ +| 0 | ++--------------+ + +select bitxor(8,4); /* 00001000 ^ 00000100 */ ++--------------+ +| bitxor(8, 4) | ++--------------+ +| 12 | ++--------------+ + +select bitxor(3,7); /* 00000011 ^ 00000111 */ ++--------------+ +| bitxor(3, 7) | ++--------------+ +| 4 | ++--------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="countset"> + + <dt> + <codeph>countset(integer_type a [, int zero_or_one])</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">countset() function</indexterm> + <b>Purpose:</b> By default, returns the number of 1 bits in the specified integer value. + If the optional second argument is set to zero, it returns the number of 0 bits instead. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + In discussions of information theory, this operation is referred to as the + <q><xref href="https://en.wikipedia.org/wiki/Hamming_weight" scope="external" format="html">population count</xref></q> + or <q>popcount</q>. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how to count the number of 1 bits in an integer value. + </p> +<codeblock>select countset(1); /* 00000001 */ ++-------------+ +| countset(1) | ++-------------+ +| 1 | ++-------------+ + +select countset(3); /* 00000011 */ ++-------------+ +| countset(3) | ++-------------+ +| 2 | ++-------------+ + +select countset(16); /* 00010000 */ ++--------------+ +| countset(16) | ++--------------+ +| 1 | ++--------------+ + +select countset(17); /* 00010001 */ ++--------------+ +| countset(17) | ++--------------+ +| 2 | ++--------------+ + +select countset(7,1); /* 00000111 = 3 1 bits; the function counts 1 bits by default */ ++----------------+ +| countset(7, 1) | ++----------------+ +| 3 | ++----------------+ + +select countset(7,0); /* 00000111 = 5 0 bits; third argument can only be 0 or 1 */ ++----------------+ +| countset(7, 0) | ++----------------+ +| 5 | ++----------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="getbit"> + + <dt> + <codeph>getbit(integer_type a, int position)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">getbit() function</indexterm> + <b>Purpose:</b> Returns a 0 or 1 representing the bit at a + specified position. The positions are numbered right to left, starting at zero. + The position argument cannot be negative. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + When you use a literal input value, it is treated as an 8-bit, 16-bit, + and so on value, the smallest type that is appropriate. + The type of the input value limits the range of the positions. + Cast the input value to the appropriate type if you need to + ensure it is treated as a 64-bit, 32-bit, and so on value. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how to test a specific bit within an integer value. + </p> +<codeblock>select getbit(1,0); /* 00000001 */ ++--------------+ +| getbit(1, 0) | ++--------------+ +| 1 | ++--------------+ + +select getbit(16,1) /* 00010000 */ ++---------------+ +| getbit(16, 1) | ++---------------+ +| 0 | ++---------------+ + +select getbit(16,4) /* 00010000 */ ++---------------+ +| getbit(16, 4) | ++---------------+ +| 1 | ++---------------+ + +select getbit(16,5) /* 00010000 */ ++---------------+ +| getbit(16, 5) | ++---------------+ +| 0 | ++---------------+ + +select getbit(-1,3); /* 11111111 */ ++---------------+ +| getbit(-1, 3) | ++---------------+ +| 1 | ++---------------+ + +select getbit(-1,25); /* 11111111 */ +ERROR: Invalid bit position: 25 + +select getbit(cast(-1 as int),25); /* 11111111111111111111111111111111 */ ++-----------------------------+ +| getbit(cast(-1 as int), 25) | ++-----------------------------+ +| 1 | ++-----------------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="rotateleft"> + + <dt> + <codeph>rotateleft(integer_type a, int positions)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">rotateleft() function</indexterm> + <b>Purpose:</b> Rotates an integer value left by a specified number of bits. + As the most significant bit is taken out of the original value, + if it is a 1 bit, it is <q>rotated</q> back to the least significant bit. + Therefore, the final value has the same number of 1 bits as the original value, + just in different positions. + In computer science terms, this operation is a + <q><xref href="https://en.wikipedia.org/wiki/Circular_shift" scope="external" format="html">circular shift</xref></q>. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Specifying a second argument of zero leaves the original value unchanged. + Rotating a -1 value by any number of positions still returns -1, + because the original value has all 1 bits and all the 1 bits are + preserved during rotation. + Similarly, rotating a 0 value by any number of positions still returns 0. + Rotating a value by the same number of bits as in the value returns the same value. + Because this is a circular operation, the number of positions is not limited + to the number of bits in the input value. + For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an + identical result in each case. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select rotateleft(1,4); /* 00000001 -> 00010000 */ ++------------------+ +| rotateleft(1, 4) | ++------------------+ +| 16 | ++------------------+ + +select rotateleft(-1,155); /* 11111111 -> 11111111 */ ++---------------------+ +| rotateleft(-1, 155) | ++---------------------+ +| -1 | ++---------------------+ + +select rotateleft(-128,1); /* 10000000 -> 00000001 */ ++---------------------+ +| rotateleft(-128, 1) | ++---------------------+ +| 1 | ++---------------------+ + +select rotateleft(-127,3); /* 10000001 -> 00001100 */ ++---------------------+ +| rotateleft(-127, 3) | ++---------------------+ +| 12 | ++---------------------+ + +</codeblock> + </dd> + + </dlentry> + + <dlentry id="rotateright"> + + <dt> + <codeph>rotateright(integer_type a, int positions)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">rotateright() function</indexterm> + <b>Purpose:</b> Rotates an integer value right by a specified number of bits. + As the least significant bit is taken out of the original value, + if it is a 1 bit, it is <q>rotated</q> back to the most significant bit. + Therefore, the final value has the same number of 1 bits as the original value, + just in different positions. + In computer science terms, this operation is a + <q><xref href="https://en.wikipedia.org/wiki/Circular_shift" scope="external" format="html">circular shift</xref></q>. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Specifying a second argument of zero leaves the original value unchanged. + Rotating a -1 value by any number of positions still returns -1, + because the original value has all 1 bits and all the 1 bits are + preserved during rotation. + Similarly, rotating a 0 value by any number of positions still returns 0. + Rotating a value by the same number of bits as in the value returns the same value. + Because this is a circular operation, the number of positions is not limited + to the number of bits in the input value. + For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an + identical result in each case. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select rotateright(16,4); /* 00010000 -> 00000001 */ ++--------------------+ +| rotateright(16, 4) | ++--------------------+ +| 1 | ++--------------------+ + +select rotateright(-1,155); /* 11111111 -> 11111111 */ ++----------------------+ +| rotateright(-1, 155) | ++----------------------+ +| -1 | ++----------------------+ + +select rotateright(-128,1); /* 10000000 -> 01000000 */ ++----------------------+ +| rotateright(-128, 1) | ++----------------------+ +| 64 | ++----------------------+ + +select rotateright(-127,3); /* 10000001 -> 00110000 */ ++----------------------+ +| rotateright(-127, 3) | ++----------------------+ +| 48 | ++----------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="setbit"> + + <dt> + <codeph>setbit(integer_type a, int position [, int zero_or_one])</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">setbit() function</indexterm> + <b>Purpose:</b> By default, changes a bit at a specified position to a 1, if it is not already. + If the optional third argument is set to zero, the specified bit is set to 0 instead. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + If the bit at the specified position was already 1 (by default) + or 0 (with a third argument of zero), the return value is + the same as the first argument. + The positions are numbered right to left, starting at zero. + (Therefore, the return value could be different from the first argument + even if the position argument is zero.) + The position argument cannot be negative. + <p> + When you use a literal input value, it is treated as an 8-bit, 16-bit, + and so on value, the smallest type that is appropriate. + The type of the input value limits the range of the positions. + Cast the input value to the appropriate type if you need to + ensure it is treated as a 64-bit, 32-bit, and so on value. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select setbit(0,0); /* 00000000 -> 00000001 */ ++--------------+ +| setbit(0, 0) | ++--------------+ +| 1 | ++--------------+ + +select setbit(0,3); /* 00000000 -> 00001000 */ ++--------------+ +| setbit(0, 3) | ++--------------+ +| 8 | ++--------------+ + +select setbit(7,3); /* 00000111 -> 00001111 */ ++--------------+ +| setbit(7, 3) | ++--------------+ +| 15 | ++--------------+ + +select setbit(15,3); /* 00001111 -> 00001111 */ ++---------------+ +| setbit(15, 3) | ++---------------+ +| 15 | ++---------------+ + +select setbit(0,32); /* By default, 0 is a TINYINT with only 8 bits. */ +ERROR: Invalid bit position: 32 + +select setbit(cast(0 as bigint),32); /* For BIGINT, the position can be 0..63. */ ++-------------------------------+ +| setbit(cast(0 as bigint), 32) | ++-------------------------------+ +| 4294967296 | ++-------------------------------+ + +select setbit(7,3,1); /* 00000111 -> 00001111; setting to 1 is the default */ ++-----------------+ +| setbit(7, 3, 1) | ++-----------------+ +| 15 | ++-----------------+ + +select setbit(7,2,0); /* 00000111 -> 00000011; third argument of 0 clears instead of sets */ ++-----------------+ +| setbit(7, 2, 0) | ++-----------------+ +| 3 | ++-----------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="shiftleft"> + + <dt> + <codeph>shiftleft(integer_type a, int positions)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">shiftleft() function</indexterm> + <b>Purpose:</b> Shifts an integer value left by a specified number of bits. + As the most significant bit is taken out of the original value, + it is discarded and the least significant bit becomes 0. + In computer science terms, this operation is a <q><xref href="https://en.wikipedia.org/wiki/Logical_shift" scope="external" format="html">logical shift</xref></q>. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + The final value has either the same number of 1 bits as the original value, or fewer. + Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces + a result of zero. + </p> + <p> + Specifying a second argument of zero leaves the original value unchanged. + Shifting any value by 0 returns the original value. + Shifting any value by 1 is the same as multiplying it by 2, + as long as the value is small enough; larger values eventually + become negative when shifted, as the sign bit is set. + Starting with the value 1 and shifting it left by N positions gives + the same result as 2 to the Nth power, or <codeph>pow(2,<varname>N</varname>)</codeph>. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select shiftleft(1,0); /* 00000001 -> 00000001 */ ++-----------------+ +| shiftleft(1, 0) | ++-----------------+ +| 1 | ++-----------------+ + +select shiftleft(1,3); /* 00000001 -> 00001000 */ ++-----------------+ +| shiftleft(1, 3) | ++-----------------+ +| 8 | ++-----------------+ + +select shiftleft(8,2); /* 00001000 -> 00100000 */ ++-----------------+ +| shiftleft(8, 2) | ++-----------------+ +| 32 | ++-----------------+ + +select shiftleft(127,1); /* 01111111 -> 11111110 */ ++-------------------+ +| shiftleft(127, 1) | ++-------------------+ +| -2 | ++-------------------+ + +select shiftleft(127,5); /* 01111111 -> 11100000 */ ++-------------------+ +| shiftleft(127, 5) | ++-------------------+ +| -32 | ++-------------------+ + +select shiftleft(-1,4); /* 11111111 -> 11110000 */ ++------------------+ +| shiftleft(-1, 4) | ++------------------+ +| -16 | ++------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="shiftright"> + + <dt> + <codeph>shiftright(integer_type a, int positions)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">shiftright() function</indexterm> + <b>Purpose:</b> Shifts an integer value right by a specified number of bits. + As the least significant bit is taken out of the original value, + it is discarded and the most significant bit becomes 0. + In computer science terms, this operation is a <q><xref href="https://en.wikipedia.org/wiki/Logical_shift" scope="external" format="html">logical shift</xref></q>. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Therefore, the final value has either the same number of 1 bits as the original value, or fewer. + Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces + a result of zero. + </p> + <p> + Specifying a second argument of zero leaves the original value unchanged. + Shifting any value by 0 returns the original value. + Shifting any positive value right by 1 is the same as dividing it by 2. + Negative values become positive when shifted right. + </p> + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select shiftright(16,0); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 0) | ++-------------------+ +| 16 | ++-------------------+ + +select shiftright(16,4); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 4) | ++-------------------+ +| 1 | ++-------------------+ + +select shiftright(16,5); /* 00010000 -> 00000000 */ ++-------------------+ +| shiftright(16, 5) | ++-------------------+ +| 0 | ++-------------------+ + +select shiftright(-1,1); /* 11111111 -> 01111111 */ ++-------------------+ +| shiftright(-1, 1) | ++-------------------+ +| 127 | ++-------------------+ + +select shiftright(-1,5); /* 11111111 -> 00000111 */ ++-------------------+ +| shiftright(-1, 5) | ++-------------------+ +| 7 | ++-------------------+ +</codeblock> + </dd> + + </dlentry> + + </dl> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_boolean.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_boolean.xml b/docs/topics/impala_boolean.xml new file mode 100644 index 0000000..6a8e299 --- /dev/null +++ b/docs/topics/impala_boolean.xml @@ -0,0 +1,128 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="boolean"> + + <title>BOOLEAN Data Type</title> + <titlealts><navtitle>BOOLEAN</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Schemas"/> + </metadata> + </prolog> + + <conbody> + + <p> + A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements, representing a + single true/false choice. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + + <p> + In the column definition of a <codeph>CREATE TABLE</codeph> statement: + </p> + +<codeblock><varname>column_name</varname> BOOLEAN</codeblock> + + <p> + <b>Range:</b> <codeph>TRUE</codeph> or <codeph>FALSE</codeph>. Do not use quotation marks around the + <codeph>TRUE</codeph> and <codeph>FALSE</codeph> literal values. You can write the literal values in + uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase, + <codeph>true</codeph> or <codeph>false</codeph>. + </p> + + <p> + <b>Conversions:</b> Impala does not automatically convert any other type to <codeph>BOOLEAN</codeph>. All + conversions must use an explicit call to the <codeph>CAST()</codeph> function. + </p> + + <p> + You can use <codeph>CAST()</codeph> to convert <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, + <codeph>INT</codeph>, <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>, or <codeph>DOUBLE</codeph> +<!-- any integer or floating-point type to --> + <codeph>BOOLEAN</codeph>: a value of 0 represents <codeph>false</codeph>, and any non-zero value is converted + to <codeph>true</codeph>. + </p> + + <p rev="1.4.0"> +<!-- BOOLEAN-to-DECIMAL casting requested in IMPALA-991. As of Sept. 2014, designated "won't fix". --> + You can cast <codeph>DECIMAL</codeph> values to <codeph>BOOLEAN</codeph>, with the same treatment of zero and + non-zero values as the other numeric types. You cannot cast a <codeph>BOOLEAN</codeph> to a + <codeph>DECIMAL</codeph>. + </p> + + <p> + You cannot cast a <codeph>STRING</codeph> value to <codeph>BOOLEAN</codeph>, although you can cast a + <codeph>BOOLEAN</codeph> value to <codeph>STRING</codeph>, returning <codeph>'1'</codeph> for + <codeph>true</codeph> values and <codeph>'0'</codeph> for <codeph>false</codeph> values. + </p> + + <p> + Although you can cast a <codeph>TIMESTAMP</codeph> to a <codeph>BOOLEAN</codeph> or a + <codeph>BOOLEAN</codeph> to a <codeph>TIMESTAMP</codeph>, the results are unlikely to be useful. Any non-zero + <codeph>TIMESTAMP</codeph> (that is, any value other than <codeph>1970-01-01 00:00:00</codeph>) becomes + <codeph>TRUE</codeph> when converted to <codeph>BOOLEAN</codeph>, while <codeph>1970-01-01 00:00:00</codeph> + becomes <codeph>FALSE</codeph>. A value of <codeph>FALSE</codeph> becomes <codeph>1970-01-01 + 00:00:00</codeph> when converted to <codeph>BOOLEAN</codeph>, and <codeph>TRUE</codeph> becomes one second + past this epoch date, that is, <codeph>1970-01-01 00:00:01</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/null_null_arguments"/> + + <p conref="../shared/impala_common.xml#common/partitioning_blurb"/> + + <p> + Do not use a <codeph>BOOLEAN</codeph> column as a partition key. Although you can create such a table, + subsequent operations produce errors: + </p> + +<codeblock>[localhost:21000] > create table truth_table (assertion string) partitioned by (truth boolean); +[localhost:21000] > insert into truth_table values ('Pigs can fly',false); +ERROR: AnalysisException: INSERT into table with BOOLEAN partition column (truth) is not supported: partitioning.truth_table +</codeblock> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>SELECT 1 < 2; +SELECT 2 = 5; +SELECT 100 < NULL, 100 > NULL; +CREATE TABLE assertions (claim STRING, really BOOLEAN); +INSERT INTO assertions VALUES + ("1 is less than 2", 1 < 2), + ("2 is the same as 5", 2 = 5), + ("Grass is green", true), + ("The moon is made of green cheese", false); +SELECT claim FROM assertions WHERE really = TRUE; +</codeblock> + + <p conref="../shared/impala_common.xml#common/hbase_ok"/> + + <p conref="../shared/impala_common.xml#common/parquet_ok"/> + + <p conref="../shared/impala_common.xml#common/text_bulky"/> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/compatibility_blurb"/> --> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/internals_blurb"/> --> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/added_in_20"/> --> + + <p conref="../shared/impala_common.xml#common/column_stats_constant"/> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/restrictions_blurb"/> --> + +<!-- <p conref="/Content/impala_common_xi44078.xml#common/related_info"/> --> + + <p> + <b>Related information:</b> <xref href="impala_literals.xml#boolean_literals"/>, + <xref href="impala_operators.xml#operators"/>, + <xref href="impala_conditional_functions.xml#conditional_functions"/> + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_char.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_char.xml b/docs/topics/impala_char.xml new file mode 100644 index 0000000..68cabeb --- /dev/null +++ b/docs/topics/impala_char.xml @@ -0,0 +1,275 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="char" rev="2.0.0"> + + <title>CHAR Data Type (CDH 5.2 or higher only)</title> + <titlealts><navtitle>CHAR (CDH 5.2 or higher only)</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Schemas"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">CHAR data type</indexterm> + A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If + values are longer than the specified length, Impala truncates any trailing characters. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + + <p> + In the column definition of a <codeph>CREATE TABLE</codeph> statement: + </p> + +<codeblock><varname>column_name</varname> CHAR(<varname>length</varname>)</codeblock> + + <p> + The maximum length you can specify is 255. + </p> + + <p> + <b>Semantics of trailing spaces:</b> + </p> + + <ul> + <li> + When you store a <codeph>CHAR</codeph> value shorter than the specified length in a table, queries return + the value padded with trailing spaces if necessary; the resulting value has the same length as specified in + the column definition. + </li> + + <li> + If you store a <codeph>CHAR</codeph> value containing trailing spaces in a table, those trailing spaces are + not stored in the data file. When the value is retrieved by a query, the result could have a different + number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the + specified length of the column. + </li> + + <li> + If you compare two <codeph>CHAR</codeph> values that differ only in the number of trailing spaces, those + values are considered identical. + </li> + </ul> + + <p conref="../shared/impala_common.xml#common/partitioning_bad"/> + + <p conref="../shared/impala_common.xml#common/hbase_no"/> + + <p conref="../shared/impala_common.xml#common/parquet_blurb"/> + + <ul> + <li> + This type can be read from and written to Parquet files. + </li> + + <li> + There is no requirement for a particular level of Parquet. + </li> + + <li> + Parquet files generated by Impala and containing this type can be freely interchanged with other components + such as Hive and MapReduce. + </li> + + <li> + Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files. + </li> + + <li> + Parquet data files might contain values that are longer than allowed by the + <codeph>CHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra trailing characters when + it processes those values during a query. + </li> + </ul> + + <p conref="../shared/impala_common.xml#common/text_blurb"/> + + <p> + Text data files might contain values that are longer than allowed for a particular + <codeph>CHAR(<varname>n</varname>)</codeph> column. Any extra trailing characters are ignored when Impala + processes those values during a query. Text data files can also contain values that are shorter than the + defined length limit, and Impala pads them with trailing spaces up to the specified length. Any text data + files produced by Impala <codeph>INSERT</codeph> statements do not include any trailing blanks for + <codeph>CHAR</codeph> columns. + </p> + + <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> + + <p> + This type is available using Impala 2.0 or higher under CDH 4, or with Impala on CDH 5.2 or higher. There are + no compatibility issues with other components when exchanging data files or running Impala on CDH 4. + </p> + + <p> + Some other database systems make the length specification optional. For Impala, the length is required. + </p> + +<!-- +<p> +The Impala maximum length is larger than for the <codeph>CHAR</codeph> data type in Hive. +If a Hive query encounters a <codeph>CHAR</codeph> value longer than 255 during processing, +it silently treats the value as length 255. +</p> +--> + + <p conref="../shared/impala_common.xml#common/internals_max_bytes"/> + + <p conref="../shared/impala_common.xml#common/added_in_20"/> + + <p conref="../shared/impala_common.xml#common/column_stats_constant"/> + +<!-- Seems like a logical design decision but don't think it's currently implemented like this. +<p> +Because both the maximum and average length are always known and always the same for +any given <codeph>CHAR(<varname>n</varname>)</codeph> column, those fields are always filled +in for <codeph>SHOW COLUMN STATS</codeph> output, even before you run +<codeph>COMPUTE STATS</codeph> on the table. +</p> +--> + + <p conref="../shared/impala_common.xml#common/udf_blurb_no"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + These examples show how trailing spaces are not considered significant when comparing or processing + <codeph>CHAR</codeph> values. <codeph>CAST()</codeph> truncates any longer string to fit within the defined + length. If a <codeph>CHAR</codeph> value is shorter than the specified length, it is padded on the right with + spaces until it matches the specified length. Therefore, <codeph>LENGTH()</codeph> represents the length + including any trailing spaces, and <codeph>CONCAT()</codeph> also treats the column value as if it has + trailing spaces. + </p> + +<codeblock>select cast('x' as char(4)) = cast('x ' as char(4)) as "unpadded equal to padded"; ++--------------------------+ +| unpadded equal to padded | ++--------------------------+ +| true | ++--------------------------+ + +create table char_length(c char(3)); +insert into char_length values (cast('1' as char(3))), (cast('12' as char(3))), (cast('123' as char(3))), (cast('123456' as char(3))); +select concat("[",c,"]") as c, length(c) from char_length; ++-------+-----------+ +| c | length(c) | ++-------+-----------+ +| [1 ] | 3 | +| [12 ] | 3 | +| [123] | 3 | +| [123] | 3 | ++-------+-----------+ +</codeblock> + + <p> + This example shows a case where data values are known to have a specific length, where <codeph>CHAR</codeph> + is a logical data type to use. +<!-- +Because all the <codeph>CHAR</codeph> values have a constant predictable length, +Impala can efficiently analyze how best to use these values in join queries, +aggregation queries, and other contexts where column length is significant. +--> + </p> + +<codeblock>create table addresses + (id bigint, + street_name string, + state_abbreviation char(2), + country_abbreviation char(2)); +</codeblock> + + <p> + The following example shows how values written by Impala do not physically include the trailing spaces. It + creates a table using text format, with <codeph>CHAR</codeph> values much shorter than the declared length, + and then prints the resulting data file to show that the delimited values are not separated by spaces. The + same behavior applies to binary-format Parquet data files. + </p> + +<codeblock>create table char_in_text (a char(20), b char(30), c char(40)) + row format delimited fields terminated by ','; + +insert into char_in_text values (cast('foo' as char(20)), cast('bar' as char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' as char(30)), cast('aloha' as char(40))); + +-- Running this Linux command inside impala-shell using the ! shortcut. +!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*'; +foo,bar,baz +hello,goodbye,aloha +</codeblock> + + <p> + The following example further illustrates the treatment of spaces. It replaces the contents of the previous + table with some values including leading spaces, trailing spaces, or both. Any leading spaces are preserved + within the data file, but trailing spaces are discarded. Then when the values are retrieved by a query, the + leading spaces are retrieved verbatim while any necessary trailing spaces are supplied by Impala. + </p> + +<codeblock>insert overwrite char_in_text values (cast('trailing ' as char(20)), cast(' leading and trailing ' as char(30)), cast(' leading' as char(40))); +!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*'; +trailing, leading and trailing, leading + +select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c from char_in_text; ++------------------------+----------------------------------+--------------------------------------------+ +| a | b | c | ++------------------------+----------------------------------+--------------------------------------------+ +| [trailing ] | [ leading and trailing ] | [ leading ] | ++------------------------+----------------------------------+--------------------------------------------+ +</codeblock> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <p> + Because the blank-padding behavior requires allocating the maximum length for each value in memory, for + scalability reasons avoid declaring <codeph>CHAR</codeph> columns that are much longer than typical values in + that column. + </p> + + <p conref="../shared/impala_common.xml#common/blobs_are_strings"/> + + <p> + When an expression compares a <codeph>CHAR</codeph> with a <codeph>STRING</codeph> or + <codeph>VARCHAR</codeph>, the <codeph>CHAR</codeph> value is implicitly converted to <codeph>STRING</codeph> + first, with trailing spaces preserved. + </p> + +<codeblock>select cast("foo " as char(5)) = 'foo' as "char equal to string"; ++----------------------+ +| char equal to string | ++----------------------+ +| false | ++----------------------+ +</codeblock> + + <p> + This behavior differs from other popular database systems. To get the expected result of + <codeph>TRUE</codeph>, cast the expressions on both sides to <codeph>CHAR</codeph> values of the appropriate + length: + </p> + +<codeblock>select cast("foo " as char(5)) = cast('foo' as char(3)) as "char equal to string"; ++----------------------+ +| char equal to string | ++----------------------+ +| true | ++----------------------+ +</codeblock> + + <p> + This behavior is subject to change in future releases. + </p> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_string.xml#string"/>, <xref href="impala_varchar.xml#varchar"/>, + <xref href="impala_literals.xml#string_literals"/>, + <xref href="impala_string_functions.xml#string_functions"/> + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_comments.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_comments.xml b/docs/topics/impala_comments.xml new file mode 100644 index 0000000..96b9479 --- /dev/null +++ b/docs/topics/impala_comments.xml @@ -0,0 +1,51 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="comments"> + + <title>Comments</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">comments (SQL)</indexterm> + Impala supports the familiar styles of SQL comments: + </p> + + <ul> + <li> + All text from a <codeph>--</codeph> sequence to the end of the line is considered a comment and ignored. + This type of comment can occur on a single line by itself, or after all or part of a statement. + </li> + + <li> + All text from a <codeph>/*</codeph> sequence to the next <codeph>*/</codeph> sequence is considered a + comment and ignored. This type of comment can stretch over multiple lines. This type of comment can occur + on one or more lines by itself, in the middle of a statement, or before or after a statement. + </li> + </ul> + + <p> + For example: + </p> + +<codeblock>-- This line is a comment about a table. +create table ...; + +/* +This is a multi-line comment about a query. +*/ +select ...; + +select * from t /* This is an embedded comment about a query. */ where ...; + +select * from t -- This is a trailing comment within a multi-line command. +where ...; +</codeblock> + </conbody> +</concept>
