http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_live_summary.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_live_summary.xml b/docs/topics/impala_live_summary.xml new file mode 100644 index 0000000..bfe71bf --- /dev/null +++ b/docs/topics/impala_live_summary.xml @@ -0,0 +1,207 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept rev="2.3.0" id="live_summary"> + + <title>LIVE_SUMMARY Query Option</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Query Options"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Performance"/> + <data name="Category" value="Reports"/> + <data name="Category" value="impala-shell"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">LIVE_SUMMARY query option</indexterm> + For queries submitted through the <cmdname>impala-shell</cmdname> command, + displays the same output as the <codeph>SUMMARY</codeph> command, + with the measurements updated in real time as the query progresses. + When the query finishes, the final <codeph>SUMMARY</codeph> output remains + visible in the <cmdname>impala-shell</cmdname> console output. + </p> + + <p> + </p> + + <p conref="../shared/impala_common.xml#common/type_boolean"/> + <p conref="../shared/impala_common.xml#common/default_false_0"/> + + <p conref="../shared/impala_common.xml#common/command_line_blurb"/> + <p> + You can enable this query option within <cmdname>impala-shell</cmdname> + by starting the shell with the <codeph>--live_summary</codeph> + command-line option. + You can still turn this setting off and on again within the shell through the + <codeph>SET</codeph> command. + </p> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + The live summary output can be useful for evaluating long-running queries, + to evaluate which phase of execution takes up the most time, or if some hosts + take much longer than others for certain operations, dragging overall performance down. + By making the information available in real time, this feature lets you decide what + action to take even before you cancel a query that is taking much longer than normal. + </p> + <p> + For example, you might see the HDFS scan phase taking a long time, and therefore revisit + performance-related aspects of your schema design such as constructing a partitioned table, + switching to the Parquet file format, running the <codeph>COMPUTE STATS</codeph> statement + for the table, and so on. + Or you might see a wide variation between the average and maximum times for all hosts to + perform some phase of the query, and therefore investigate if one particular host + needed more memory or was experiencing a network problem. + </p> + <p conref="../shared/impala_common.xml#common/live_reporting_details"/> + <p> + For a simple and concise way of tracking the progress of an interactive query, see + <xref href="impala_live_progress.xml#live_progress"/>. + </p> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/> + <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + The following example shows a series of <codeph>LIVE_SUMMARY</codeph> reports that + are displayed during the course of a query, showing how the numbers increase to + show the progress of different phases of the distributed query. When you do the same + in <cmdname>impala-shell</cmdname>, only a single report is displayed at any one time, + with each update overwriting the previous numbers. + </p> + +<codeblock><![CDATA[[localhost:21000] > set live_summary=true; +LIVE_SUMMARY set to true +[localhost:21000] > select count(*) from customer t1 cross join customer t2; ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | 10.00 MB | | +| 02:NESTED LOOP JOIN | 0 | 0ns | 0ns | 0 | 22.50B | 0 B | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 17.62s | 17.62s | 81.14M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 247.53ms | 247.53ms | 1.02K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 61.85s | 61.85s | 283.43M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 247.59ms | 247.59ms | 2.05K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +]]> +</codeblock> + +<!-- Keeping this sample output that illustrates a couple of glitches in the LIVE_SUMMARY display, hidden, to help filing JIRAs. --> +<codeblock audience="Cloudera"><![CDATA[[ ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 91.34s | 91.34s | 419.48M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 247.63ms | 247.63ms | 3.07K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 140.49s | 140.49s | 646.82M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 247.73ms | 247.73ms | 5.12K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 228.96s | 228.96s | 1.06B | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 247.83ms | 247.83ms | 7.17K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 563.11s | 563.11s | 2.59B | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 248.11ms | 248.11ms | 17.41K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | 985.71s | 985.71s | 4.54B | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 248.49ms | 248.49ms | 30.72K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE | +| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED | +| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | | +| 02:NESTED LOOP JOIN | 1 | None | None | 5.42B | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST | +| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST | +| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 | +| 00:SCAN HDFS | 1 | 248.66ms | 248.66ms | 36.86K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 | ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ + +[localhost:21000] > select count(*) from customer t1 cross join customer t2; +Query: select count(*) from customer t1 cross join customer t2 +[####################################################################################################] 100% ++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ +| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +[localhost:21000] > +]]> +</codeblock> + + <p conref="../shared/impala_common.xml#common/live_progress_live_summary_asciinema"/> + + </conbody> +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_load_data.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_load_data.xml b/docs/topics/impala_load_data.xml new file mode 100644 index 0000000..e3517f0 --- /dev/null +++ b/docs/topics/impala_load_data.xml @@ -0,0 +1,237 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept rev="1.1" id="load_data"> + + <title>LOAD DATA Statement</title> + <titlealts><navtitle>LOAD DATA</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="ETL"/> + <data name="Category" value="Ingest"/> + <data name="Category" value="DML"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="HDFS"/> + <data name="Category" value="Tables"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">LOAD DATA statement</indexterm> + The <codeph>LOAD DATA</codeph> statement streamlines the ETL process for an internal Impala table by moving a + data file or all the data files in a directory from an HDFS location into the Impala data directory for that + table. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock>LOAD DATA INPATH '<varname>hdfs_file_or_directory_path</varname>' [OVERWRITE] INTO TABLE <varname>tablename</varname> + [PARTITION (<varname>partcol1</varname>=<varname>val1</varname>, <varname>partcol2</varname>=<varname>val2</varname> ...)]</codeblock> + + <p> + When the <codeph>LOAD DATA</codeph> statement operates on a partitioned table, + it always operates on one partition at a time. Specify the <codeph>PARTITION</codeph> clauses + and list all the partition key columns, with a constant value specified for each. + </p> + + <p conref="../shared/impala_common.xml#common/dml_blurb"/> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <ul> + <li> + The loaded data files are moved, not copied, into the Impala data directory. + </li> + + <li> + You can specify the HDFS path of a single file to be moved, or the HDFS path of a directory to move all the + files inside that directory. You cannot specify any sort of wildcard to take only some of the files from a + directory. When loading a directory full of data files, keep all the data files at the top level, with no + nested directories underneath. + </li> + + <li> + Currently, the Impala <codeph>LOAD DATA</codeph> statement only imports files from HDFS, not from the local + filesystem. It does not support the <codeph>LOCAL</codeph> keyword of the Hive <codeph>LOAD DATA</codeph> + statement. You must specify a path, not an <codeph>hdfs://</codeph> URI. + </li> + + <li> + In the interest of speed, only limited error checking is done. If the loaded files have the wrong file + format, different columns than the destination table, or other kind of mismatch, Impala does not raise any + error for the <codeph>LOAD DATA</codeph> statement. Querying the table afterward could produce a runtime + error or unexpected results. Currently, the only checking the <codeph>LOAD DATA</codeph> statement does is + to avoid mixing together uncompressed and LZO-compressed text files in the same table. + </li> + + <li> + When you specify an HDFS directory name as the <codeph>LOAD DATA</codeph> argument, any hidden files in + that directory (files whose names start with a <codeph>.</codeph>) are not moved to the Impala data + directory. + </li> + + <li> + The loaded data files retain their original names in the new location, unless a name conflicts with an + existing data file, in which case the name of the new file is modified slightly to be unique. (The + name-mangling is a slight difference from the Hive <codeph>LOAD DATA</codeph> statement, which replaces + identically named files.) + </li> + + <li> + By providing an easy way to transport files from known locations in HDFS into the Impala data directory + structure, the <codeph>LOAD DATA</codeph> statement lets you avoid memorizing the locations and layout of + HDFS directory tree containing the Impala databases and tables. (For a quick way to check the location of + the data files for an Impala table, issue the statement <codeph>DESCRIBE FORMATTED + <varname>table_name</varname></codeph>.) + </li> + + <li> + The <codeph>PARTITION</codeph> clause is especially convenient for ingesting new data for a partitioned + table. As you receive new data for a time period, geographic region, or other division that corresponds to + one or more partitioning columns, you can load that data straight into the appropriate Impala data + directory, which might be nested several levels down if the table is partitioned by multiple columns. When + the table is partitioned, you must specify constant values for all the partitioning columns. + </li> + </ul> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p rev="2.3.0"> + Because Impala currently cannot create Parquet data files containing complex types + (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>), the + <codeph>LOAD DATA</codeph> statement is especially important when working with + tables containing complex type columns. You create the Parquet data files outside + Impala, then use either <codeph>LOAD DATA</codeph>, an external table, or HDFS-level + file operations followed by <codeph>REFRESH</codeph> to associate the data files with + the corresponding table. + See <xref href="impala_complex_types.xml#complex_types"/> for details about using complex types. + </p> + + <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/> + + <note conref="../shared/impala_common.xml#common/compute_stats_next"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <p> + First, we use a trivial Python script to write different numbers of strings (one per line) into files stored + in the <codeph>cloudera</codeph> HDFS user account. (Substitute the path for your own HDFS user account when + doing <cmdname>hdfs dfs</cmdname> operations like these.) + </p> + +<codeblock>$ random_strings.py 1000 | hdfs dfs -put - /user/cloudera/thousand_strings.txt +$ random_strings.py 100 | hdfs dfs -put - /user/cloudera/hundred_strings.txt +$ random_strings.py 10 | hdfs dfs -put - /user/cloudera/ten_strings.txt</codeblock> + + <p> + Next, we create a table and load an initial set of data into it. Remember, unless you specify a + <codeph>STORED AS</codeph> clause, Impala tables default to <codeph>TEXTFILE</codeph> format with Ctrl-A (hex + 01) as the field delimiter. This example uses a single-column table, so the delimiter is not significant. For + large-scale ETL jobs, you would typically use binary format data files such as Parquet or Avro, and load them + into Impala tables that use the corresponding file format. + </p> + +<codeblock>[localhost:21000] > create table t1 (s string); +[localhost:21000] > load data inpath '/user/cloudera/thousand_strings.txt' into table t1; +Query finished, fetching results ... ++----------------------------------------------------------+ +| summary | ++----------------------------------------------------------+ +| Loaded 1 file(s). Total files in destination location: 1 | ++----------------------------------------------------------+ +Returned 1 row(s) in 0.61s +[kilo2-202-961.cs1cloud.internal:21000] > select count(*) from t1; +Query finished, fetching results ... ++------+ +| _c0 | ++------+ +| 1000 | ++------+ +Returned 1 row(s) in 0.67s +[localhost:21000] > load data inpath '/user/cloudera/thousand_strings.txt' into table t1; +ERROR: AnalysisException: INPATH location '/user/cloudera/thousand_strings.txt' does not exist. </codeblock> + + <p> + As indicated by the message at the end of the previous example, the data file was moved from its original + location. The following example illustrates how the data file was moved into the Impala data directory for + the destination table, keeping its original filename: + </p> + +<codeblock>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1 +Found 1 items +-rw-r--r-- 1 cloudera cloudera 13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</codeblock> + + <p> + The following example demonstrates the difference between the <codeph>INTO TABLE</codeph> and + <codeph>OVERWRITE TABLE</codeph> clauses. The table already contains 1000 rows. After issuing the + <codeph>LOAD DATA</codeph> statement with the <codeph>INTO TABLE</codeph> clause, the table contains 100 more + rows, for a total of 1100. After issuing the <codeph>LOAD DATA</codeph> statement with the <codeph>OVERWRITE + INTO TABLE</codeph> clause, the former contents are gone, and now the table only contains the 10 rows from + the just-loaded data file. + </p> + +<codeblock>[localhost:21000] > load data inpath '/user/cloudera/hundred_strings.txt' into table t1; +Query finished, fetching results ... ++----------------------------------------------------------+ +| summary | ++----------------------------------------------------------+ +| Loaded 1 file(s). Total files in destination location: 2 | ++----------------------------------------------------------+ +Returned 1 row(s) in 0.24s +[localhost:21000] > select count(*) from t1; +Query finished, fetching results ... ++------+ +| _c0 | ++------+ +| 1100 | ++------+ +Returned 1 row(s) in 0.55s +[localhost:21000] > load data inpath '/user/cloudera/ten_strings.txt' overwrite into table t1; +Query finished, fetching results ... ++----------------------------------------------------------+ +| summary | ++----------------------------------------------------------+ +| Loaded 1 file(s). Total files in destination location: 1 | ++----------------------------------------------------------+ +Returned 1 row(s) in 0.26s +[localhost:21000] > select count(*) from t1; +Query finished, fetching results ... ++-----+ +| _c0 | ++-----+ +| 10 | ++-----+ +Returned 1 row(s) in 0.62s</codeblock> + + <p conref="../shared/impala_common.xml#common/s3_blurb"/> + <p conref="../shared/impala_common.xml#common/s3_dml"/> + + <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/> + + <p conref="../shared/impala_common.xml#common/permissions_blurb"/> + <p rev="CDH-19187"> + The user ID that the <cmdname>impalad</cmdname> daemon runs under, + typically the <codeph>impala</codeph> user, must have read and write + permissions for the files in the source directory, and write + permission for the destination directory. + </p> + + <p conref="../shared/impala_common.xml#common/related_info"/> + <p> + The <codeph>LOAD DATA</codeph> statement is an alternative to the + <codeph>INSERT</codeph> statement. Use <codeph>LOAD DATA</codeph> + when you have the data files in HDFS but outside of any Impala table. + </p> + <p> + The <codeph>LOAD DATA</codeph> statement is also an alternative + to the <codeph>CREATE EXTERNAL TABLE</codeph> statement. Use + <codeph>LOAD DATA</codeph> when it is appropriate to move the + data files under Impala control rather than querying them + from their original location. + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_map.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_map.xml b/docs/topics/impala_map.xml new file mode 100644 index 0000000..41e4754 --- /dev/null +++ b/docs/topics/impala_map.xml @@ -0,0 +1,264 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> + <concept id="map"> + + <title>MAP Complex Type (CDH 5.5 or higher only)</title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + </metadata> + </prolog> + + <conbody> + + <p> + A complex data type representing an arbitrary set of key-value pairs. + The key part is a scalar type, while the value part can be a scalar or + another complex type (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, + or <codeph>MAP</codeph>). + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>column_name</varname> MAP < <varname>primitive_type</varname>, <varname>type</varname> > + +type ::= <varname>primitive_type</varname> | <varname>complex_type</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_combo"/> + + <p> + The <codeph>MAP</codeph> complex data type represents a set of key-value pairs. + Each element of the map is indexed by a primitive type such as <codeph>BIGINT</codeph> or + <codeph>STRING</codeph>, letting you define sequences that are not continuous or categories with arbitrary names. + You might find it convenient for modelling data produced in other languages, such as a + Python dictionary or Java HashMap, where a single scalar value serves as the lookup key. + </p> + + <p> + In a big data context, the keys in a map column might represent a numeric sequence of events during a + manufacturing process, or <codeph>TIMESTAMP</codeph> values corresponding to sensor observations. + The map itself is inherently unordered, so you choose whether to make the key values significant + (such as a recorded <codeph>TIMESTAMP</codeph>) or synthetic (such as a random global universal ID). + </p> + + <note> + Behind the scenes, the <codeph>MAP</codeph> type is implemented in a similar way as the + <codeph>ARRAY</codeph> type. Impala does not enforce any uniqueness constraint on the + <codeph>KEY</codeph> values, and the <codeph>KEY</codeph> values are processed by + looping through the elements of the <codeph>MAP</codeph> rather than by a constant-time lookup. + Therefore, this type is primarily for ease of understanding when importing data and + algorithms from non-SQL contexts, rather than optimizing the performance of key lookups. + </note> + + <p conref="../shared/impala_common.xml#common/complex_types_describe"/> + + <p conref="../shared/impala_common.xml#common/added_in_230"/> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <ul conref="../shared/impala_common.xml#common/complex_types_restrictions"> + <li/> + </ul> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/> + + <p> + The following example shows a table with various kinds of <codeph>MAP</codeph> columns, + both at the top level and nested within other complex types. + Each row represents information about a specific country, with complex type fields + of various levels of nesting to represent different information associated + with the country: factual measurements such as area and population, + notable people in different categories, geographic features such as + cities, points of interest within each city, and mountains with associated facts. + Practice the <codeph>CREATE TABLE</codeph> and query notation for complex type columns + using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably. + </p> + +<codeblock><![CDATA[create TABLE map_demo +( + country_id BIGINT, + +-- Numeric facts about each country, looked up by name. +-- For example, 'Area':1000, 'Population':999999. +-- Using a MAP instead of a STRUCT because there could be +-- a different set of facts for each country. + metrics MAP <STRING, BIGINT>, + +-- MAP whose value part is an ARRAY. +-- For example, the key 'Famous Politicians' could represent an array of 10 elements, +-- while the key 'Famous Actors' could represent an array of 20 elements. + notables MAP <STRING, ARRAY <STRING>>, + +-- MAP that is a field within a STRUCT. +-- (The STRUCT is inside another ARRAY, because it is rare +-- for a STRUCT to be a top-level column.) +-- For example, city #1 might have points of interest with key 'Zoo', +-- representing an array of 3 different zoos. +-- City #2 might have completely different kinds of points of interest. +-- Because the set of field names is potentially large, and most entries could be blank, +-- a MAP makes more sense than a STRUCT to represent such a sparse data structure. + cities ARRAY < STRUCT < + name: STRING, + points_of_interest: MAP <STRING, ARRAY <STRING>> + >>, + +-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate +-- the mountain name with all the facts about the mountain. +-- The "key" of the map (the first STRING field) represents the name of some fact whose value +-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on. + mountains ARRAY < STRUCT < name: STRING, facts: MAP <STRING, INT > > > +) +STORED AS PARQUET; +]]> +</codeblock> + +<codeblock><![CDATA[DESCRIBE map_demo; ++------------+------------------------------------------------+ +| name | type | ++------------+------------------------------------------------+ +| country_id | bigint | +| metrics | map<string,bigint> | +| notables | map<string,array<string>> | +| cities | array<struct< | +| | name:string, | +| | points_of_interest:map<string,array<string>> | +| | >> | +| mountains | array<struct< | +| | name:string, | +| | facts:map<string,int> | +| | >> | ++------------+------------------------------------------------+ + +DESCRIBE map_demo.metrics; ++-------+--------+ +| name | type | ++-------+--------+ +| key | string | +| value | bigint | ++-------+--------+ + +DESCRIBE map_demo.notables; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ + +DESCRIBE map_demo.notables.value; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE map_demo.cities; ++------+------------------------------------------------+ +| name | type | ++------+------------------------------------------------+ +| item | struct< | +| | name:string, | +| | points_of_interest:map<string,array<string>> | +| | > | +| pos | bigint | ++------+------------------------------------------------+ + +DESCRIBE map_demo.cities.item.points_of_interest; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ + +DESCRIBE map_demo.cities.item.points_of_interest.value; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE map_demo.mountains; ++------+-------------------------+ +| name | type | ++------+-------------------------+ +| item | struct< | +| | name:string, | +| | facts:map<string,int> | +| | > | +| pos | bigint | ++------+-------------------------+ + +DESCRIBE map_demo.mountains.item.facts; ++-------+--------+ +| name | type | ++-------+--------+ +| key | string | +| value | int | ++-------+--------+ +]]> +</codeblock> + + <p> + The following example shows a table that uses a variety of data types for the <codeph>MAP</codeph> + <q>key</q> field. Typically, you use <codeph>BIGINT</codeph> or <codeph>STRING</codeph> to use + numeric or character-based keys without worrying about exceeding any size or length constraints. + </p> + +<codeblock><![CDATA[CREATE TABLE map_demo_obscure +( + id BIGINT, + m1 MAP <INT, INT>, + m2 MAP <SMALLINT, INT>, + m3 MAP <TINYINT, INT>, + m4 MAP <TIMESTAMP, INT>, + m5 MAP <BOOLEAN, INT>, + m6 MAP <CHAR(5), INT>, + m7 MAP <VARCHAR(25), INT>, + m8 MAP <FLOAT, INT>, + m9 MAP <DOUBLE, INT>, + m10 MAP <DECIMAL(12,2), INT> +) +STORED AS PARQUET; +]]> +</codeblock> + +<codeblock>CREATE TABLE celebrities (name STRING, birth_year MAP < STRING, SMALLINT >) STORED AS PARQUET; +-- A typical row might represent values with 2 different birth years, such as: +-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 }) + +CREATE TABLE countries (name STRING, famous_leaders MAP < INT, STRING >) STORED AS PARQUET; +-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as: +-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" }) + +CREATE TABLE airlines (name STRING, special_meals MAP < STRING, MAP < STRING, STRING > >) STORED AS PARQUET; +-- A typical row might represent values with multiple kinds of meals, each with several components: +-- ("Elegant Airlines", +-- { +-- "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" }, +-- "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" } +-- } ) +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_complex_types.xml#complex_types"/>, + <xref href="impala_array.xml#array"/>, + <xref href="impala_struct.xml#struct"/> + <!-- <xref href="impala_map.xml#map"/> --> + </p> + + </conbody> + + </concept> + + http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_math_functions.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_math_functions.xml b/docs/topics/impala_math_functions.xml new file mode 100644 index 0000000..fd16b37 --- /dev/null +++ b/docs/topics/impala_math_functions.xml @@ -0,0 +1,1336 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="math_functions"> + + <title>Impala Mathematical Functions</title> + <titlealts><navtitle>Mathematical Functions</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Querying"/> + </metadata> + </prolog> + + <conbody> + + <p> + Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex + than basic addition, subtraction, multiplication, and division. For example, these functions include + trigonometric, logarithmic, and base conversion operations. + </p> + + <note> + In Impala, exponentiation uses the <codeph>pow()</codeph> function rather than an exponentiation operator + such as <codeph>**</codeph>. + </note> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + The mathematical functions operate mainly on these data types: <xref href="impala_int.xml#int"/>, + <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_smallint.xml#smallint"/>, + <xref href="impala_tinyint.xml#tinyint"/>, <xref href="impala_double.xml#double"/>, + <xref href="impala_float.xml#float"/>, and <xref href="impala_decimal.xml#decimal"/>. For the operators that + perform the standard operations such as addition, subtraction, multiplication, and division, see + <xref href="impala_operators.xml#arithmetic_operators"/>. + </p> + + <p> + Functions that perform bitwise operations are explained in <xref href="impala_bit_functions.xml#bit_functions"/>. + </p> + + <p> + <b>Function reference:</b> + </p> + + <p> + Impala supports the following mathematical functions: + </p> + + <dl> + <dlentry rev="1.4.0" id="abs"> + + <dt rev="2.0.1"> + <codeph>abs(numeric_type a)</codeph> +<!-- <codeph>abs(double a), abs(decimal(p,s) a)</codeph> --> + </dt> + + <dd> + <indexterm audience="Cloudera">abs() function</indexterm> + <b>Purpose:</b> Returns the absolute value of the argument. + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> + <p> + <b>Usage notes:</b> Use this function to ensure all return values are positive. This is different than + the <codeph>positive()</codeph> function, which returns its argument unchanged (even if the argument + was negative). + </p> + </dd> + + </dlentry> + + <dlentry id="acos"> + + <dt> + <codeph>acos(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">acos() function</indexterm> + <b>Purpose:</b> Returns the arccosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="asin"> + + <dt> + <codeph>asin(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">asin() function</indexterm> + <b>Purpose:</b> Returns the arcsine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="atan"> + + <dt> + <codeph>atan(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">atan() function</indexterm> + <b>Purpose:</b> Returns the arctangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="bin"> + + <dt> + <codeph>bin(bigint a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bin() function</indexterm> + <b>Purpose:</b> Returns the binary representation of an integer value, that is, a string of 0 and 1 + digits. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="ceil"> + + <dt> + <codeph>ceil(double a)</codeph>, + <codeph>ceil(decimal(p,s) a)</codeph>, + <codeph id="ceiling">ceiling(double a)</codeph>, + <codeph>ceiling(decimal(p,s) a)</codeph>, + <codeph id="dceil" rev="2.3.0">dceil(double a)</codeph>, + <codeph rev="2.3.0">dceil(decimal(p,s) a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">ceil() function</indexterm> + <b>Purpose:</b> Returns the smallest integer that is greater than or equal to the argument. + <p> + <b>Return type:</b> <codeph>int</codeph> or <codeph>decimal(p,s)</codeph> depending on the type of the + input argument + </p> + </dd> + + </dlentry> + + <dlentry id="conv"> + + <dt> + <codeph>conv(bigint num, int from_base, int to_base), conv(string num, int from_base, int + to_base)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">conv() function</indexterm> + <b>Purpose:</b> Returns a string representation of an integer value in a particular base. The input value + can be a string, for example to convert a hexadecimal number such as <codeph>fce2</codeph> to decimal. To + use the return value as a number (for example, when converting to base 10), use <codeph>CAST()</codeph> + to convert to the appropriate type. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cos"> + + <dt> + <codeph>cos(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">cos() function</indexterm> + <b>Purpose:</b> Returns the cosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cot" rev="2.3.0"> + + <dt> + <codeph>cot(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">cot() function</indexterm> + <b>Purpose:</b> Returns the cotangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + </dd> + + </dlentry> + + <dlentry id="degrees"> + + <dt> + <codeph>degrees(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">degrees() function</indexterm> + <b>Purpose:</b> Converts argument value from radians to degrees. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="e"> + + <dt> + <codeph>e()</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">e() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="http://en.wikipedia.org/wiki/E_(mathematical_constant)" scope="external" format="html">mathematical + constant e</xref>. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="exp"> + + <dt> + <codeph>exp(double a)</codeph>, + <codeph rev="2.3.0" id="dexp">dexp(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">exp() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="http://en.wikipedia.org/wiki/E_(mathematical_constant)" scope="external" format="html">mathematical + constant e</xref> raised to the power of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="2.3.0" id="factorial"> + + <dt> + <codeph>factorial(integer_type a)</codeph> + </dt> + <dd> + <indexterm audience="Cloudera">factorial() function</indexterm> + <b>Purpose:</b> Computes the <xref href="https://en.wikipedia.org/wiki/Factorial" scope="external" format="html">factorial</xref> of an integer value. + It works with any integer type. + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p> + <b>Usage notes:</b> You can use either the <codeph>factorial()</codeph> function or the <codeph>!</codeph> operator. + The factorial of 0 is 1. Likewise, the <codeph>factorial()</codeph> function returns 1 for any negative value. + The maximum positive value for the input argument is 20; a value of 21 or greater overflows the + range for a <codeph>BIGINT</codeph> and causes an error. + </p> + <p> + <b>Return type:</b> <codeph>bigint</codeph> + </p> + <p conref="../shared/impala_common.xml#common/added_in_230"/> +<codeblock>select factorial(5); ++--------------+ +| factorial(5) | ++--------------+ +| 120 | ++--------------+ + +select 5!; ++-----+ +| 5! | ++-----+ +| 120 | ++-----+ + +select factorial(0); ++--------------+ +| factorial(0) | ++--------------+ +| 1 | ++--------------+ + +select factorial(-100); ++-----------------+ +| factorial(-100) | ++-----------------+ +| 1 | ++-----------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="floor"> + + <dt> + <codeph>floor(double a)</codeph>, + <codeph>floor(decimal(p,s) a)</codeph>, + <codeph rev="2.3.0" id="dfloor">dfloor(double a)</codeph>, + <codeph rev="2.3.0">dfloor(decimal(p,s) a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">floor() function</indexterm> + <b>Purpose:</b> Returns the largest integer that is less than or equal to the argument. + <p> + <b>Return type:</b> <codeph>bigint</codeph> or <codeph>decimal(p,s)</codeph> depending on the type of + the input argument + </p> + </dd> + + </dlentry> + + <dlentry id="fmod"> + + <dt> + <codeph>fmod(double a, double b), fmod(float a, float b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">fmod() function</indexterm> + <b>Purpose:</b> Returns the modulus of a floating-point number. Equivalent to the <codeph>%</codeph> arithmetic operator. + <p> + <b>Return type:</b> <codeph>float</codeph> or <codeph>double</codeph>, depending on type of arguments + </p> + <p conref="../shared/impala_common.xml#common/added_in_111"/> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Because this function operates on <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> + values, it is subject to potential rounding errors for values that cannot be + represented precisely. Prefer to use whole numbers, or values that you know + can be represented precisely by the <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> + types. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show equivalent operations with the <codeph>fmod()</codeph> + function and the <codeph>%</codeph> arithmetic operator, for values not subject + to any rounding error. + </p> +<codeblock>select fmod(10,3); ++-------------+ +| fmod(10, 3) | ++-------------+ +| 1 | ++-------------+ + +select fmod(5.5,2); ++--------------+ +| fmod(5.5, 2) | ++--------------+ +| 1.5 | ++--------------+ + +select 10 % 3; ++--------+ +| 10 % 3 | ++--------+ +| 1 | ++--------+ + +select 5.5 % 2; ++---------+ +| 5.5 % 2 | ++---------+ +| 1.5 | ++---------+ +</codeblock> + <p> + The following examples show operations with the <codeph>fmod()</codeph> + function for values that cannot be represented precisely by the + <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> types, and thus are + subject to rounding error. <codeph>fmod(9.9,3.0)</codeph> returns a value + slightly different than the expected 0.9 because of rounding. + <codeph>fmod(9.9,3.3)</codeph> returns a value quite different from + the expected value of 0 because of rounding error during intermediate + calculations. + </p> +<codeblock>select fmod(9.9,3.0); ++--------------------+ +| fmod(9.9, 3.0) | ++--------------------+ +| 0.8999996185302734 | ++--------------------+ + +select fmod(9.9,3.3); ++-------------------+ +| fmod(9.9, 3.3) | ++-------------------+ +| 3.299999713897705 | ++-------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.2.2" id="fnv_hash"> + + <dt> + <codeph>fnv_hash(type v)</codeph>, + </dt> + + <dd> + <indexterm audience="Cloudera">fnv_hash() function</indexterm> + <b>Purpose:</b> Returns a consistent 64-bit value derived from the input argument, for convenience of + implementing hashing logic in an application. + <p> + <b>Return type:</b> <codeph>BIGINT</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + You might use the return value in an application where you perform load balancing, bucketing, or some + other technique to divide processing or storage. + </p> + <p> + Because the result can be any 64-bit value, to restrict the value to a particular range, you can use an + expression that includes the <codeph>ABS()</codeph> function and the <codeph>%</codeph> (modulo) + operator. For example, to produce a hash value in the range 0-9, you could use the expression + <codeph>ABS(FNV_HASH(x)) % 10</codeph>. + </p> + <p> + This function implements the same algorithm that Impala uses internally for hashing, on systems where + the CRC32 instructions are not available. + </p> + <p> + This function implements the + <xref href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function" scope="external" format="html">FowlerâNollâVo + hash function</xref>, in particular the FNV-1a variation. This is not a perfect hash function: some + combinations of values could produce the same result value. It is not suitable for cryptographic use. + </p> + <p> + Similar input values of different types could produce different hash values, for example the same + numeric value represented as <codeph>SMALLINT</codeph> or <codeph>BIGINT</codeph>, + <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>, or <codeph>DECIMAL(5,2)</codeph> or + <codeph>DECIMAL(20,5)</codeph>. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>[localhost:21000] > create table h (x int, s string); +[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism'); +[localhost:21000] > select x, fnv_hash(x) from h; ++------------+----------------------+ +| x | fnv_hash(x) | ++------------+----------------------+ +| 0 | -2611523532599129963 | +| 1 | 4307505193096137732 | +| 1234567890 | 3614724209955230832 | ++------------+----------------------+ +[localhost:21000] > select s, fnv_hash(s) from h; ++------------------------------+---------------------+ +| s | fnv_hash(s) | ++------------------------------+---------------------+ +| hello | 6414202926103426347 | +| world | 6535280128821139475 | +| antidisestablishmentarianism | -209330013948433970 | ++------------------------------+---------------------+ +[localhost:21000] > select s, abs(fnv_hash(s)) % 10 from h; ++------------------------------+-------------------------+ +| s | abs(fnv_hash(s)) % 10.0 | ++------------------------------+-------------------------+ +| hello | 8 | +| world | 6 | +| antidisestablishmentarianism | 4 | ++------------------------------+-------------------------+</codeblock> + <p> + For short argument values, the high-order bits of the result have relatively low entropy: + </p> +<codeblock>[localhost:21000] > create table b (x boolean); +[localhost:21000] > insert into b values (true), (true), (false), (false); +[localhost:21000] > select x, fnv_hash(x) from b; ++-------+---------------------+ +| x | fnv_hash(x) | ++-------+---------------------+ +| true | 2062020650953872396 | +| true | 2062020650953872396 | +| false | 2062021750465500607 | +| false | 2062021750465500607 | ++-------+---------------------+</codeblock> + <p> + <b>Added in:</b> Impala 1.2.2 + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="greatest"> + + <dt> + <codeph>greatest(bigint a[, bigint b ...])</codeph>, <codeph>greatest(double a[, double b ...])</codeph>, + <codeph>greatest(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>greatest(string a[, string b + ...])</codeph>, <codeph>greatest(timestamp a[, timestamp b ...])</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">greatest() function</indexterm> + <b>Purpose:</b> Returns the largest value from a list of expressions. + <p conref="../shared/impala_common.xml#common/return_same_type"/> + </dd> + + </dlentry> + + <dlentry id="hex"> + + <dt> + <codeph>hex(bigint a), hex(string a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">hex() function</indexterm> + <b>Purpose:</b> Returns the hexadecimal representation of an integer value, or of the characters in a + string. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="is_inf"> + + <dt> + <codeph>is_inf(double a)</codeph>, + </dt> + + <dd> + <indexterm audience="Cloudera">is_inf() function</indexterm> + <b>Purpose:</b> Tests whether a value is equal to the special value <q>inf</q>, signifying infinity. + <p> + <b>Return type:</b> <codeph>boolean</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Infinity and NaN can be specified in text data files as <codeph>inf</codeph> and <codeph>nan</codeph> + respectively, and Impala interprets them as these special values. They can also be produced by certain + arithmetic expressions; for example, <codeph>pow(-1, 0.5)</codeph> returns infinity and + <codeph>1/0</codeph> returns NaN. Or you can cast the literal values, such as <codeph>CAST('nan' AS + DOUBLE)</codeph> or <codeph>CAST('inf' AS DOUBLE)</codeph>. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="is_nan"> + + <dt> + <codeph>is_nan(double a)</codeph>, + </dt> + + <dd> + <indexterm audience="Cloudera">is_nan() function</indexterm> + <b>Purpose:</b> Tests whether a value is equal to the special value <q>NaN</q>, signifying <q>not a + number</q>. + <p> + <b>Return type:</b> <codeph>boolean</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Infinity and NaN can be specified in text data files as <codeph>inf</codeph> and <codeph>nan</codeph> + respectively, and Impala interprets them as these special values. They can also be produced by certain + arithmetic expressions; for example, <codeph>pow(-1, 0.5)</codeph> returns infinity and + <codeph>1/0</codeph> returns NaN. Or you can cast the literal values, such as <codeph>CAST('nan' AS + DOUBLE)</codeph> or <codeph>CAST('inf' AS DOUBLE)</codeph>. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="least"> + + <dt> + <codeph>least(bigint a[, bigint b ...])</codeph>, <codeph>least(double a[, double b ...])</codeph>, + <codeph>least(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>least(string a[, string b + ...])</codeph>, <codeph>least(timestamp a[, timestamp b ...])</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">least() function</indexterm> + <b>Purpose:</b> Returns the smallest value from a list of expressions. + <p conref="../shared/impala_common.xml#common/return_same_type"/> + </dd> + + </dlentry> + + <dlentry id="ln"> + + <dt> + <codeph>ln(double a)</codeph>, + <codeph rev="2.3.0" id="dlog1">dlog1(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">ln() function</indexterm> + <indexterm audience="Cloudera">dlog1() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="https://en.wikipedia.org/wiki/Natural_logarithm" scope="external" format="html">natural + logarithm</xref> of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log"> + + <dt> + <codeph>log(double base, double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the second argument to the specified base. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log10"> + + <dt> + <codeph>log10(double a)</codeph>, + <codeph rev="2.3.0" id="dlog10">dlog10(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log10() function</indexterm> + <indexterm audience="Cloudera">dlog10() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the argument to the base 10. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log2"> + + <dt> + <codeph>log2(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log2() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the argument to the base 2. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="max_int"> + + <dt> + <codeph>max_int(), <ph id="max_tinyint">max_tinyint()</ph>, <ph id="max_smallint">max_smallint()</ph>, + <ph id="max_bigint">max_bigint()</ph></codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">max_int() function</indexterm> + <indexterm audience="Cloudera">max_tinyint() function</indexterm> + <indexterm audience="Cloudera">max_smallint() function</indexterm> + <indexterm audience="Cloudera">max_bigint() function</indexterm> + <b>Purpose:</b> Returns the largest value of the associated integral type. + <p> + <b>Return type:</b> The same as the integral type being checked. + </p> + <p> +<!-- Repeated usage text between max_ and min_ functions, could turn into a conref. --> + <b>Usage notes:</b> Use the corresponding <codeph>min_</codeph> and <codeph>max_</codeph> functions to + check if all values in a column are within the allowed range, before copying data or altering column + definitions. If not, switch to the next higher integral type or to a <codeph>DECIMAL</codeph> with + sufficient precision. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="min_int"> + + <dt> + <codeph>min_int(), <ph id="min_tinyint">min_tinyint()</ph>, <ph id="min_smallint">min_smallint()</ph>, + <ph id="min_bigint">min_bigint()</ph></codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">min_int() function</indexterm> + <indexterm audience="Cloudera">min_tinyint() function</indexterm> + <indexterm audience="Cloudera">min_smallint() function</indexterm> + <indexterm audience="Cloudera">min_bigint() function</indexterm> + <b>Purpose:</b> Returns the smallest value of the associated integral type (a negative number). + <p> + <b>Return type:</b> The same as the integral type being checked. + </p> + <p> + <b>Usage notes:</b> Use the corresponding <codeph>min_</codeph> and <codeph>max_</codeph> functions to + check if all values in a column are within the allowed range, before copying data or altering column + definitions. If not, switch to the next higher integral type or to a <codeph>DECIMAL</codeph> with + sufficient precision. + </p> + </dd> + + </dlentry> + + <dlentry id="mod" rev="2.2.0"> + + <dt> + <codeph>mod(<varname>numeric_type</varname> a, <varname>same_type</varname> b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">mod() function</indexterm> + <b>Purpose:</b> Returns the modulus of a number. Equivalent to the <codeph>%</codeph> arithmetic operator. + Works with any size integer type, any size floating-point type, and <codeph>DECIMAL</codeph> + with any precision and scale. + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_220"/> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Because this function works with <codeph>DECIMAL</codeph> values, prefer it over <codeph>fmod()</codeph> + when working with fractional values. It is not subject to the rounding errors that make + <codeph>fmod()</codeph> problematic with floating-point numbers. + The <codeph>%</codeph> arithmetic operator now uses the <codeph>mod()</codeph> function + in cases where its arguments can be interpreted as <codeph>DECIMAL</codeph> values, + increasing the accuracy of that operator. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how the <codeph>mod()</codeph> function works for + whole numbers and fractional values, and how the <codeph>%</codeph> operator + works the same way. In the case of <codeph>mod(9.9,3)</codeph>, + the type conversion for the second argument results in the first argument + being interpreted as <codeph>DOUBLE</codeph>, so to produce an accurate + <codeph>DECIMAL</codeph> result requires casting the second argument + or writing it as a <codeph>DECIMAL</codeph> literal, 3.0. + </p> +<codeblock>select mod(10,3); ++-------------+ +| fmod(10, 3) | ++-------------+ +| 1 | ++-------------+ + +select mod(5.5,2); ++--------------+ +| fmod(5.5, 2) | ++--------------+ +| 1.5 | ++--------------+ + +select 10 % 3; ++--------+ +| 10 % 3 | ++--------+ +| 1 | ++--------+ + +select 5.5 % 2; ++---------+ +| 5.5 % 2 | ++---------+ +| 1.5 | ++---------+ + +select mod(9.9,3.3); ++---------------+ +| mod(9.9, 3.3) | ++---------------+ +| 0.0 | ++---------------+ + +select mod(9.9,3); ++--------------------+ +| mod(9.9, 3) | ++--------------------+ +| 0.8999996185302734 | ++--------------------+ + +select mod(9.9, cast(3 as decimal(2,1))); ++-----------------------------------+ +| mod(9.9, cast(3 as decimal(2,1))) | ++-----------------------------------+ +| 0.9 | ++-----------------------------------+ + +select mod(9.9,3.0); ++---------------+ +| mod(9.9, 3.0) | ++---------------+ +| 0.9 | ++---------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="negative"> + + <dt rev="2.0.1"> + <codeph>negative(numeric_type a)</codeph> +<!-- <codeph>negative(int a), negative(double a), negative(decimal(p,s) a)</codeph> --> + </dt> + + <dd> + <indexterm audience="Cloudera">negative() function</indexterm> + <b>Purpose:</b> Returns the argument with the sign reversed; returns a positive value if the argument was + already negative. + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> +<!-- + <p> + <b>Return type:</b> <codeph>int</codeph>, <codeph>double</codeph>, + or <codeph>decimal(p,s)</codeph> depending on type of argument + </p> + --> + <p> + <b>Usage notes:</b> Use <codeph>-abs(a)</codeph> instead if you need to ensure all return values are + negative. + </p> + </dd> + + </dlentry> + + <dlentry id="pi"> + + <dt> + <codeph>pi()</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">pi() function</indexterm> + <b>Purpose:</b> Returns the constant pi. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="pmod"> + + <dt> + <codeph>pmod(bigint a, bigint b), pmod(double a, double b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">pmod() function</indexterm> + <b>Purpose:</b> Returns the positive modulus of a number. + Primarily for <xref href="https://issues.apache.org/jira/browse/HIVE-656" scope="external" format="html">HiveQL compatibility</xref>. + <p> + <b>Return type:</b> <codeph>int</codeph> or <codeph>double</codeph>, depending on type of arguments + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how the <codeph>fmod()</codeph> function sometimes returns a negative value + depending on the sign of its arguments, and the <codeph>pmod()</codeph> function returns the same value + as <codeph>fmod()</codeph>, but sometimes with the sign flipped. + </p> +<codeblock>select fmod(-5,2); ++-------------+ +| fmod(-5, 2) | ++-------------+ +| -1 | ++-------------+ + +select pmod(-5,2); ++-------------+ +| pmod(-5, 2) | ++-------------+ +| 1 | ++-------------+ + +select fmod(-5,-2); ++--------------+ +| fmod(-5, -2) | ++--------------+ +| -1 | ++--------------+ + +select pmod(-5,-2); ++--------------+ +| pmod(-5, -2) | ++--------------+ +| -1 | ++--------------+ + +select fmod(5,-2); ++-------------+ +| fmod(5, -2) | ++-------------+ +| 1 | ++-------------+ + +select pmod(5,-2); ++-------------+ +| pmod(5, -2) | ++-------------+ +| -1 | ++-------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="positive"> + + <dt rev="2.0.1"> + <codeph>positive(numeric_type a)</codeph> +<!-- <codeph>positive(int a), positive(double a), positive(decimal(p,s) a</codeph> --> + </dt> + + <dd> + <indexterm audience="Cloudera">positive() function</indexterm> + <b>Purpose:</b> Returns the original argument unchanged (even if the argument is negative). + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> +<!-- + <p> + <b>Return type:</b> <codeph>int</codeph>, <codeph>double</codeph>, + or <codeph>decimal(p,s)</codeph> depending on type of argument + </p> + --> + <p> + <b>Usage notes:</b> Use <codeph>abs()</codeph> instead if you need to ensure all return values are + positive. + </p> + </dd> + + </dlentry> + + <dlentry id="pow"> + + <dt> + <codeph>pow(double a, double p)</codeph>, + <codeph id="power">power(double a, double p)</codeph>, + <codeph rev="2.3.0" id="dpow">dpow(double a, double p)</codeph>, + <codeph rev="2.3.0" id="fpow">fpow(double a, double p)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">pow() function</indexterm> + <indexterm audience="Cloudera">power() function</indexterm> + <indexterm audience="Cloudera">dpow() function</indexterm> + <indexterm audience="Cloudera">fpow() function</indexterm> + <b>Purpose:</b> Returns the first argument raised to the power of the second argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="precision"> + + <dt> + <codeph>precision(<varname>numeric_expression</varname>)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">precision() function</indexterm> + <b>Purpose:</b> Computes the precision (number of decimal digits) needed to represent the type of the + argument expression as a <codeph>DECIMAL</codeph> value. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Typically used in combination with the <codeph>scale()</codeph> function, to determine the appropriate + <codeph>DECIMAL(<varname>precision</varname>,<varname>scale</varname>)</codeph> type to declare in a + <codeph>CREATE TABLE</codeph> statement or <codeph>CAST()</codeph> function. + </p> + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p conref="../shared/impala_common.xml#common/precision_scale_example"/> + </dd> + + </dlentry> + + <dlentry id="quotient"> + + <dt> + <codeph>quotient(int numerator, int denominator)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">quotient() function</indexterm> + <b>Purpose:</b> Returns the first argument divided by the second argument, discarding any fractional + part. Avoids promoting arguments to <codeph>DOUBLE</codeph> as happens with the <codeph>/</codeph> SQL + operator. + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="radians"> + + <dt> + <codeph>radians(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">radians() function</indexterm> + <b>Purpose:</b> Converts argument value from degrees to radians. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="rand"> + + <dt> + <codeph>rand()</codeph>, <codeph>rand(int seed)</codeph>, + <codeph rev="2.3.0" id="random">random()</codeph>, + <codeph rev="2.3.0">random(int seed)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">rand() function</indexterm> + <b>Purpose:</b> Returns a random value between 0 and 1. After <codeph>rand()</codeph> is called with a + seed argument, it produces a consistent random sequence based on the seed value. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + <p> + <b>Usage notes:</b> Currently, the random sequence is reset after each query, and multiple calls to + <codeph>rand()</codeph> within the same query return the same value each time. For different number + sequences that are different for each query, pass a unique seed value to each call to + <codeph>rand()</codeph>. For example, <codeph>select rand(unix_timestamp()) from ...</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how <codeph>rand()</codeph> can produce sequences of varying predictability, + so that you can reproduce query results involving random values or generate unique sequences of random + values for each query. + When <codeph>rand()</codeph> is called with no argument, it generates the same sequence of values each time, + regardless of the ordering of the result set. + When <codeph>rand()</codeph> is called with a constant integer, it generates a different sequence of values, + but still always the same sequence for the same seed value. + If you pass in a seed value that changes, such as the return value of the expression <codeph>unix_timestamp(now())</codeph>, + each query will use a different sequence of random values, potentially more useful in probability calculations although + more difficult to reproduce at a later time. Therefore, the final two examples with an unpredictable seed value + also include the seed in the result set, to make it possible to reproduce the same random sequence later. + </p> +<codeblock>select x, rand() from three_rows; ++---+-----------------------+ +| x | rand() | ++---+-----------------------+ +| 1 | 0.0004714746030380365 | +| 2 | 0.5895895192351144 | +| 3 | 0.4431900859080209 | ++---+-----------------------+ + +select x, rand() from three_rows order by x desc; ++---+-----------------------+ +| x | rand() | ++---+-----------------------+ +| 3 | 0.0004714746030380365 | +| 2 | 0.5895895192351144 | +| 1 | 0.4431900859080209 | ++---+-----------------------+ + +select x, rand(1234) from three_rows order by x; ++---+----------------------+ +| x | rand(1234) | ++---+----------------------+ +| 1 | 0.7377511392057646 | +| 2 | 0.009428468537250751 | +| 3 | 0.208117277924026 | ++---+----------------------+ + +select x, rand(1234) from three_rows order by x desc; ++---+----------------------+ +| x | rand(1234) | ++---+----------------------+ +| 3 | 0.7377511392057646 | +| 2 | 0.009428468537250751 | +| 1 | 0.208117277924026 | ++---+----------------------+ + +select x, unix_timestamp(now()), rand(unix_timestamp(now())) + from three_rows order by x; ++---+-----------------------+-----------------------------+ +| x | unix_timestamp(now()) | rand(unix_timestamp(now())) | ++---+-----------------------+-----------------------------+ +| 1 | 1440777752 | 0.002051228658320023 | +| 2 | 1440777752 | 0.5098743483004506 | +| 3 | 1440777752 | 0.9517714925817081 | ++---+-----------------------+-----------------------------+ + +select x, unix_timestamp(now()), rand(unix_timestamp(now())) + from three_rows order by x desc; ++---+-----------------------+-----------------------------+ +| x | unix_timestamp(now()) | rand(unix_timestamp(now())) | ++---+-----------------------+-----------------------------+ +| 3 | 1440777761 | 0.9985985015512437 | +| 2 | 1440777761 | 0.3251255333074953 | +| 1 | 1440777761 | 0.02422675025846192 | ++---+-----------------------+-----------------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="round"> + + <dt> + <codeph>round(double a)</codeph>, + <codeph>round(double a, int d)</codeph>, + <codeph rev="1.4.0">round(decimal a, int_type d)</codeph>, + <codeph rev="2.3.0" id="dround">dround(double a)</codeph>, + <codeph rev="2.3.0">dround(double a, int d)</codeph>, + <codeph rev="2.3.0">dround(decimal(p,s) a, int_type d)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">round() function</indexterm> + <indexterm audience="Cloudera">dround() function</indexterm> + <b>Purpose:</b> Rounds a floating-point value. By default (with a single argument), rounds to the nearest + integer. Values ending in .5 are rounded up for positive numbers, down for negative numbers (that is, + away from zero). The optional second argument specifies how many digits to leave after the decimal point; + values greater than zero produce a floating-point return value rounded to the requested number of digits + to the right of the decimal point. + <p rev="1.4.0"> + <b>Return type:</b> <codeph>bigint</codeph> for single <codeph>double</codeph> argument. + <codeph>double</codeph> for two-argument signature when second argument greater than zero. + For <codeph>DECIMAL</codeph> values, the smallest + <codeph>DECIMAL(<varname>p</varname>,<varname>s</varname>)</codeph> type with appropriate precision and + scale. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="scale"> + + <dt> + <codeph>scale(<varname>numeric_expression</varname>)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">scale() function</indexterm> + <b>Purpose:</b> Computes the scale (number of decimal digits to the right of the decimal point) needed to + represent the type of the argument expression as a <codeph>DECIMAL</codeph> value. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Typically used in combination with the <codeph>precision()</codeph> function, to determine the + appropriate <codeph>DECIMAL(<varname>precision</varname>,<varname>scale</varname>)</codeph> type to + declare in a <codeph>CREATE TABLE</codeph> statement or <codeph>CAST()</codeph> function. + </p> + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p conref="../shared/impala_common.xml#common/precision_scale_example"/> + </dd> + + </dlentry> + + <dlentry id="sign"> + + <dt> + <codeph>sign(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sign() function</indexterm> + <b>Purpose:</b> Returns -1, 0, or 1 to indicate the signedness of the argument value. + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="sin"> + + <dt> + <codeph>sin(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sin() function</indexterm> + <b>Purpose:</b> Returns the sine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="sqrt"> + + <dt> + <codeph>sqrt(double a)</codeph>, + <codeph rev="2.3.0" id="dsqrt">dsqrt(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sqrt() function</indexterm> + <indexterm audience="Cloudera">dsqrt() function</indexterm> + <b>Purpose:</b> Returns the square root of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="tan"> + + <dt> + <codeph>tan(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">tan() function</indexterm> + <b>Purpose:</b> Returns the tangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="2.3.0" id="truncate"> + + <dt> + <codeph>truncate(double_or_decimal a[, digits_to_leave])</codeph>, + <ph id="dtrunc"><codeph>dtrunc(double_or_decimal a[, digits_to_leave])</codeph></ph> + </dt> + + <dd> + <indexterm audience="Cloudera">truncate() function</indexterm> + <indexterm audience="Cloudera">dtrunc() function</indexterm> + <b>Purpose:</b> Removes some or all fractional digits from a numeric value. + With no argument, removes all fractional digits, leaving an integer value. + The optional argument specifies the number of fractional digits to include + in the return value, and only applies with the argument type is <codeph>DECIMAL</codeph>. + <codeph>truncate()</codeph> and <codeph>dtrunc()</codeph> are aliases for the same function. + <p> + <b>Return type:</b> <codeph>decimal</codeph> for <codeph>DECIMAL</codeph> arguments; + <codeph>bigint</codeph> for <codeph>DOUBLE</codeph> arguments + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select truncate(3.45) ++----------------+ +| truncate(3.45) | ++----------------+ +| 3 | ++----------------+ + +select truncate(-3.45) ++-----------------+ +| truncate(-3.45) | ++-----------------+ +| -3 | ++-----------------+ + +select truncate(3.456,1) ++--------------------+ +| truncate(3.456, 1) | ++--------------------+ +| 3.4 | ++--------------------+ + +select dtrunc(3.456,1) ++------------------+ +| dtrunc(3.456, 1) | ++------------------+ +| 3.4 | ++------------------+ + +select truncate(3.456,2) ++--------------------+ +| truncate(3.456, 2) | ++--------------------+ +| 3.45 | ++--------------------+ + +select truncate(3.456,7) ++--------------------+ +| truncate(3.456, 7) | ++--------------------+ +| 3.4560000 | ++--------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="unhex"> + + <dt> + <codeph>unhex(string a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">unhex() function</indexterm> + <b>Purpose:</b> Returns a string of characters with ASCII values corresponding to pairs of hexadecimal + digits in the argument. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + </dl> + </conbody> +</concept>
