http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_logging.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_logging.xml b/docs/topics/impala_logging.xml new file mode 100644 index 0000000..960f1e6 --- /dev/null +++ b/docs/topics/impala_logging.xml @@ -0,0 +1,471 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="logging"> + + <title>Using Impala Logging</title> + <titlealts audience="PDF"><navtitle>Logging</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Logs"/> + <data name="Category" value="Troubleshooting"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + The Impala logs record information about: + </p> + + <ul> + <li> + Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and + troubleshoot that problem before you can do anything further with Impala. + </li> + + <li> + How Impala is configured. + </li> + + <li> + Jobs Impala has completed. + </li> + </ul> + + <note> + <p> + Formerly, the logs contained the query profile for each query, showing low-level details of how the work is + distributed among nodes and how intermediate and final results are transmitted across the network. To save + space, those query profiles are now stored in zlib-compressed files in + <filepath>/var/log/impala/profiles</filepath>. You can access them through the Impala web user interface. + For example, at <codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each query + is followed by a <codeph>Profile</codeph> link leading to a page showing extensive analytical data for the + query execution. + </p> + + <p rev="1.1.1"> + The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when + enabled. See <xref href="impala_auditing.xml#auditing"/> for details. + </p> + + <p rev="2.2.0"> + The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when + enabled. See <xref href="impala_lineage.xml#lineage"/> for details. + </p> + </note> + + <p outputclass="toc inpage"/> + + </conbody> + + <concept id="logs_details"> + + <title>Locations and Names of Impala Log Files</title> + + <conbody> + + <ul> + <li> + By default, the log files are under the directory <filepath>/var/log/impala</filepath>. +<!-- TK: split this task out and state CM and non-CM ways. --> + To change log file locations, modify the defaults file described in + <xref href="impala_processes.xml#processes"/>. + </li> + + <li> + The significant files for the <codeph>impalad</codeph> process are <filepath>impalad.INFO</filepath>, + <filepath>impalad.WARNING</filepath>, and <filepath>impalad.ERROR</filepath>. You might also see a file + <filepath>impalad.FATAL</filepath>, although this is only present in rare conditions. + </li> + + <li> + The significant files for the <codeph>statestored</codeph> process are + <filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and + <filepath>statestored.ERROR</filepath>. You might also see a file <filepath>statestored.FATAL</filepath>, + although this is only present in rare conditions. + </li> + + <li rev="1.2"> + The significant files for the <codeph>catalogd</codeph> process are <filepath>catalogd.INFO</filepath>, + <filepath>catalogd.WARNING</filepath>, and <filepath>catalogd.ERROR</filepath>. You might also see a file + <filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions. + </li> + + <li> + Examine the <codeph>.INFO</codeph> files to see configuration settings for the processes. + </li> + + <li> + Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information, including such + things as suboptimal settings and also serious runtime errors. + </li> + + <li> + Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only the most serious + errors, if the processes crash, or queries fail to complete. These messages are also in the + <codeph>.WARNING</codeph> file. + </li> + + <li> + A new set of log files is produced each time the associated daemon is restarted. These log files have + long names including a timestamp. The <codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and + <codeph>.ERROR</codeph> files are physically represented as symbolic links to the latest applicable log + files. + </li> + + <li> + The init script for the <codeph>impala-server</codeph> service also produces a consolidated log file + <codeph>/var/logs/impalad/impala-server.log</codeph>, with all the same information as the + corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files. + </li> + + <li> + The init script for the <codeph>impala-state-store</codeph> service also produces a consolidated log file + <codeph>/var/logs/impalad/impala-state-store.log</codeph>, with all the same information as the + corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files. + </li> + </ul> + + <p> + Impala stores information using the <codeph>glog_v</codeph> logging system. You will see some messages + referring to C++ file names. Logging is affected by: + </p> + + <ul> + <li> + The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are logged. See + <xref href="#log_levels"/> for details. + </li> + + <li> + The <codeph>-logbuflevel</codeph> startup flag for the <cmdname>impalad</cmdname> daemon specifies how + often the log information is written to disk. The default is 0, meaning that the log is immediately + flushed to disk when Impala outputs an important messages such as a warning or an error, but less + important messages such as informational ones are buffered in memory rather than being flushed to disk + immediately. + </li> + + <li> + Cloudera Manager has an Impala configuration setting that sets the <codeph>-logbuflevel</codeph> startup + option. + </li> + </ul> + + </conbody> + + </concept> + + <concept id="logs_cm_noncm"> + + <title>Managing Impala Logs through Cloudera Manager or Manually</title> + <prolog> + <metadata> + <data name="Category" value="Administrators"/> + <data name="Category" value="Cloudera Manager"/> + </metadata> + </prolog> + + <conbody> + + <p> + <ph rev="upstream">Cloudera</ph> recommends installing Impala through the Cloudera Manager administration interface. To assist with + troubleshooting, Cloudera Manager collects front-end and back-end logs together into a single view, and let + you do a search across log data for all the managed nodes rather than examining the logs on each node + separately. If you installed Impala using Cloudera Manager, refer to the topics on Monitoring Services + (<xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_service_monitoring.html" scope="external" format="html">CDH 5</xref>) + or Logs (<xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_logs.html" scope="external" format="html">CDH 5</xref>). + </p> + + <p> + If you are using Impala in an environment not managed by Cloudera Manager, review Impala log files on each + host, when you have traced an issue back to a specific system. + </p> + + </conbody> + + </concept> + + <concept id="logs_rotate"> + + <title>Rotating Impala Logs</title> + <prolog> + <metadata> + <data name="Category" value="Disk Storage"/> + </metadata> + </prolog> + + <conbody> + + <p> + Impala periodically switches the physical files representing the current log files, after which it is safe + to remove the old files if they are no longer needed. + </p> + + <p> + Impala can automatically remove older unneeded log files, a feature known as <term>log rotation</term>. +<!-- Another instance of the text also used in impala_new_features.xml + and impala_fixed_issues.xml. (Just took out the word "new" + and added the reference to the starting release.) + At this point, a conref is definitely in the cards. --> + </p> + + <p> + In Impala 2.2 and higher, the <codeph>-max_log_files</codeph> configuration option specifies how many log + files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon + (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and <cmdname>catalogd</cmdname>). The default + value is 10, meaning that Impala preserves the latest 10 log files for each severity level + (<codeph>INFO</codeph>, <codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and <codeph>FATAL</codeph>). + Impala checks to see if any old logs need to be removed based on the interval specified in the + <codeph>logbufsecs</codeph> setting, every 5 seconds by default. + </p> + +<!-- This extra detail only appears here. Consider if it's worth including it + in the conref so people don't need to follow a link just for a couple of + minor factoids. --> + + <p> + A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your + Linux tool or technique of choice. A value of 1 preserves only the very latest log file. + </p> + + <p> + To set up log rotation on a system managed by Cloudera Manager 5.4.0 and higher, search for the + <codeph>max_log_files</codeph> option name and set the appropriate value for the <userinput>Maximum Log + Files</userinput> field for each Impala configuration category (Impala, Catalog Server, and StateStore). + Then restart the Impala service. In earlier Cloudera Manager releases, specify the + <codeph>-max_log_files=<varname>maximum</varname></codeph> option in the <uicontrol>Command Line Argument + Advanced Configuration Snippet (Safety Valve)</uicontrol> field for each Impala configuration category. + </p> + + </conbody> + + </concept> + + <concept id="logs_debug"> + + <title>Reviewing Impala Logs</title> + + <conbody> + + <p> + By default, the Impala log is stored at <codeph>/var/logs/impalad/</codeph>. The most comprehensive log, + showing informational, warning, and error messages, is in the file name <filepath>impalad.INFO</filepath>. + View log file contents by using the web interface or by examining the contents of the log file. (When you + examine the logs through the file system, you can troubleshoot problems by reading the + <filepath>impalad.WARNING</filepath> and/or <filepath>impalad.ERROR</filepath> files, which contain the + subsets of messages indicating potential problems.) + </p> + + <p> + On a machine named <codeph>impala.example.com</codeph> with default settings, you could view the Impala + logs on that machine by using a browser to access <codeph>http://impala.example.com:25000/logs</codeph>. + </p> + + <note> + <p> + The web interface limits the amount of logging information displayed. To view every log entry, access the + log files directly through the file system. + </p> + </note> + + <p> + You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file system. With the + default configuration settings, the start of the log file appears as follows: + </p> + +<codeblock>[user@example impalad]$ pwd +/var/log/impalad +[user@example impalad]$ more impalad.INFO +Log file created at: 2013/01/07 08:42:12 +Running on machine: impala.example.com +Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg +I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff) +Built on Fri, 21 Dec 2012 12:55:19 PST +I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com +I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver): +--dump_ir=false +--module_output= +--be_port=22000 +--classpath= +--hostname=impala.example.com</codeblock> + + <note> + The preceding example shows only a small part of the log file. Impala log files are often several megabytes + in size. + </note> + + </conbody> + + </concept> + + <concept id="log_format"> + + <title>Understanding Impala Log Contents</title> + + <conbody> + + <p> + The logs store information about Impala startup options. This information appears once for each time Impala + is started and may include: + </p> + + <ul> + <li> + Machine name. + </li> + + <li> + Impala version number. + </li> + + <li> + Flags used to start Impala. + </li> + + <li> + CPU information. + </li> + + <li> + The number of available disks. + </li> + </ul> + + <p> + There is information about each job Impala has run. Because each Impala job creates an additional set of + data about queries, the amount of job specific data may be very large. Logs may contained detailed + information on jobs. These detailed log entries may include: + </p> + + <ul> + <li> + The composition of the query. + </li> + + <li> + The degree of data locality. + </li> + + <li> + Statistics on data throughput and response times. + </li> + </ul> + + </conbody> + + </concept> + + <concept id="log_levels"> + + <title>Setting Logging Levels</title> + + <conbody> + + <p> + Impala uses the GLOG system, which supports three logging levels. You can adjust the logging levels using + the Cloudera Manager Admin Console. You can adjust logging levels without going through the Cloudera + Manager Admin Console by exporting variable settings. To change logging settings manually, use a command + similar to the following on each node before starting <codeph>impalad</codeph>: + </p> + +<codeblock>export GLOG_v=1</codeblock> + + <note> + For performance reasons, Cloudera highly recommends not enabling the most verbose logging level of 3. + </note> + + <p> + For more information on how to configure GLOG, including how to set variable logging levels for different + system components, see + <xref href="http://google-glog.googlecode.com/svn/trunk/doc/glog.html" scope="external" format="html">How + To Use Google Logging Library (glog)</xref>. + </p> + + <section id="loglevels_details"> + + <title>Understanding What is Logged at Different Logging Levels</title> + + <p> + As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2 + records everything GLOG_v=1 records, as well as additional information. + </p> + + <p> + Increasing logging levels imposes performance overhead and increases log size. <ph rev="upstream">Cloudera</ph> recommends using + GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful + troubleshooting information. + </p> + + <p> + Additional information logged at each level is as follows: + </p> + + <ul> + <li> + GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an + <codeph>impalad</codeph> instance, including runtime profiles. + </li> + + <li> + GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also + records query execution progress information, including details on each file that is read. + </li> + + <li> + GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is + only applicable for the most serious troubleshooting and tuning scenarios, because it can produce + exceptionally large and detailed log files, potentially leading to its own set of performance and + capacity problems. + </li> + </ul> + + </section> + + </conbody> + + </concept> + + <concept id="redaction" rev="2.2.0"> + + <title>Redacting Sensitive Information from Impala Log Files</title> + <prolog> + <metadata> + <data name="Category" value="Redaction"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">redaction</indexterm> + <term>Log redaction</term> is a security feature that prevents sensitive information from being displayed in + locations used by administrators for monitoring and troubleshooting, such as log files, the Cloudera Manager + user interface, and the Impala debug web user interface. You configure regular expressions that match + sensitive types of information processed by your system, such as credit card numbers or tax IDs, and literals + matching these patterns are obfuscated wherever they would normally be recorded in log files or displayed in + administration or debugging user interfaces. + </p> + + <p> + In a security context, the log redaction feature is complementary to the Sentry authorization framework. + Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents + administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying + information (PII) that might appear in queries issued by those authorized users. + </p> + + <p> + See + <xref audience="integrated" href="sg_redaction.xml#log_redact"/><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html" scope="external" format="html"/> + for details about how to enable this feature and set + up the regular expressions to detect and redact sensitive information within SQL statement text. + </p> + + </conbody> + + </concept> + +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_map.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_map.xml b/docs/topics/impala_map.xml new file mode 100644 index 0000000..a20c356 --- /dev/null +++ b/docs/topics/impala_map.xml @@ -0,0 +1,267 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> + <concept id="map"> + + <title>MAP Complex Type (<keyword keyref="impala23"/> or higher only)</title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Data Types"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + A complex data type representing an arbitrary set of key-value pairs. + The key part is a scalar type, while the value part can be a scalar or + another complex type (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, + or <codeph>MAP</codeph>). + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock><varname>column_name</varname> MAP < <varname>primitive_type</varname>, <varname>type</varname> > + +type ::= <varname>primitive_type</varname> | <varname>complex_type</varname> +</codeblock> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_combo"/> + + <p> + The <codeph>MAP</codeph> complex data type represents a set of key-value pairs. + Each element of the map is indexed by a primitive type such as <codeph>BIGINT</codeph> or + <codeph>STRING</codeph>, letting you define sequences that are not continuous or categories with arbitrary names. + You might find it convenient for modelling data produced in other languages, such as a + Python dictionary or Java HashMap, where a single scalar value serves as the lookup key. + </p> + + <p> + In a big data context, the keys in a map column might represent a numeric sequence of events during a + manufacturing process, or <codeph>TIMESTAMP</codeph> values corresponding to sensor observations. + The map itself is inherently unordered, so you choose whether to make the key values significant + (such as a recorded <codeph>TIMESTAMP</codeph>) or synthetic (such as a random global universal ID). + </p> + + <note> + Behind the scenes, the <codeph>MAP</codeph> type is implemented in a similar way as the + <codeph>ARRAY</codeph> type. Impala does not enforce any uniqueness constraint on the + <codeph>KEY</codeph> values, and the <codeph>KEY</codeph> values are processed by + looping through the elements of the <codeph>MAP</codeph> rather than by a constant-time lookup. + Therefore, this type is primarily for ease of understanding when importing data and + algorithms from non-SQL contexts, rather than optimizing the performance of key lookups. + </note> + + <p conref="../shared/impala_common.xml#common/complex_types_describe"/> + + <p conref="../shared/impala_common.xml#common/added_in_230"/> + + <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> + + <ul conref="../shared/impala_common.xml#common/complex_types_restrictions"> + <li/> + </ul> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + + <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/> + + <p> + The following example shows a table with various kinds of <codeph>MAP</codeph> columns, + both at the top level and nested within other complex types. + Each row represents information about a specific country, with complex type fields + of various levels of nesting to represent different information associated + with the country: factual measurements such as area and population, + notable people in different categories, geographic features such as + cities, points of interest within each city, and mountains with associated facts. + Practice the <codeph>CREATE TABLE</codeph> and query notation for complex type columns + using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably. + </p> + +<codeblock><![CDATA[create TABLE map_demo +( + country_id BIGINT, + +-- Numeric facts about each country, looked up by name. +-- For example, 'Area':1000, 'Population':999999. +-- Using a MAP instead of a STRUCT because there could be +-- a different set of facts for each country. + metrics MAP <STRING, BIGINT>, + +-- MAP whose value part is an ARRAY. +-- For example, the key 'Famous Politicians' could represent an array of 10 elements, +-- while the key 'Famous Actors' could represent an array of 20 elements. + notables MAP <STRING, ARRAY <STRING>>, + +-- MAP that is a field within a STRUCT. +-- (The STRUCT is inside another ARRAY, because it is rare +-- for a STRUCT to be a top-level column.) +-- For example, city #1 might have points of interest with key 'Zoo', +-- representing an array of 3 different zoos. +-- City #2 might have completely different kinds of points of interest. +-- Because the set of field names is potentially large, and most entries could be blank, +-- a MAP makes more sense than a STRUCT to represent such a sparse data structure. + cities ARRAY < STRUCT < + name: STRING, + points_of_interest: MAP <STRING, ARRAY <STRING>> + >>, + +-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate +-- the mountain name with all the facts about the mountain. +-- The "key" of the map (the first STRING field) represents the name of some fact whose value +-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on. + mountains ARRAY < STRUCT < name: STRING, facts: MAP <STRING, INT > > > +) +STORED AS PARQUET; +]]> +</codeblock> + +<codeblock><![CDATA[DESCRIBE map_demo; ++------------+------------------------------------------------+ +| name | type | ++------------+------------------------------------------------+ +| country_id | bigint | +| metrics | map<string,bigint> | +| notables | map<string,array<string>> | +| cities | array<struct< | +| | name:string, | +| | points_of_interest:map<string,array<string>> | +| | >> | +| mountains | array<struct< | +| | name:string, | +| | facts:map<string,int> | +| | >> | ++------------+------------------------------------------------+ + +DESCRIBE map_demo.metrics; ++-------+--------+ +| name | type | ++-------+--------+ +| key | string | +| value | bigint | ++-------+--------+ + +DESCRIBE map_demo.notables; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ + +DESCRIBE map_demo.notables.value; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE map_demo.cities; ++------+------------------------------------------------+ +| name | type | ++------+------------------------------------------------+ +| item | struct< | +| | name:string, | +| | points_of_interest:map<string,array<string>> | +| | > | +| pos | bigint | ++------+------------------------------------------------+ + +DESCRIBE map_demo.cities.item.points_of_interest; ++-------+---------------+ +| name | type | ++-------+---------------+ +| key | string | +| value | array<string> | ++-------+---------------+ + +DESCRIBE map_demo.cities.item.points_of_interest.value; ++------+--------+ +| name | type | ++------+--------+ +| item | string | +| pos | bigint | ++------+--------+ + +DESCRIBE map_demo.mountains; ++------+-------------------------+ +| name | type | ++------+-------------------------+ +| item | struct< | +| | name:string, | +| | facts:map<string,int> | +| | > | +| pos | bigint | ++------+-------------------------+ + +DESCRIBE map_demo.mountains.item.facts; ++-------+--------+ +| name | type | ++-------+--------+ +| key | string | +| value | int | ++-------+--------+ +]]> +</codeblock> + + <p> + The following example shows a table that uses a variety of data types for the <codeph>MAP</codeph> + <q>key</q> field. Typically, you use <codeph>BIGINT</codeph> or <codeph>STRING</codeph> to use + numeric or character-based keys without worrying about exceeding any size or length constraints. + </p> + +<codeblock><![CDATA[CREATE TABLE map_demo_obscure +( + id BIGINT, + m1 MAP <INT, INT>, + m2 MAP <SMALLINT, INT>, + m3 MAP <TINYINT, INT>, + m4 MAP <TIMESTAMP, INT>, + m5 MAP <BOOLEAN, INT>, + m6 MAP <CHAR(5), INT>, + m7 MAP <VARCHAR(25), INT>, + m8 MAP <FLOAT, INT>, + m9 MAP <DOUBLE, INT>, + m10 MAP <DECIMAL(12,2), INT> +) +STORED AS PARQUET; +]]> +</codeblock> + +<codeblock>CREATE TABLE celebrities (name STRING, birth_year MAP < STRING, SMALLINT >) STORED AS PARQUET; +-- A typical row might represent values with 2 different birth years, such as: +-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 }) + +CREATE TABLE countries (name STRING, famous_leaders MAP < INT, STRING >) STORED AS PARQUET; +-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as: +-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" }) + +CREATE TABLE airlines (name STRING, special_meals MAP < STRING, MAP < STRING, STRING > >) STORED AS PARQUET; +-- A typical row might represent values with multiple kinds of meals, each with several components: +-- ("Elegant Airlines", +-- { +-- "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" }, +-- "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" } +-- } ) +</codeblock> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_complex_types.xml#complex_types"/>, + <xref href="impala_array.xml#array"/>, + <xref href="impala_struct.xml#struct"/> + <!-- <xref href="impala_map.xml#map"/> --> + </p> + + </conbody> + + </concept> + + http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_math_functions.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_math_functions.xml b/docs/topics/impala_math_functions.xml new file mode 100644 index 0000000..c82a29b --- /dev/null +++ b/docs/topics/impala_math_functions.xml @@ -0,0 +1,1389 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="math_functions"> + + <title>Impala Mathematical Functions</title> + <titlealts audience="PDF"><navtitle>Mathematical Functions</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Querying"/> + </metadata> + </prolog> + + <conbody> + + <p> + Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex + than basic addition, subtraction, multiplication, and division. For example, these functions include + trigonometric, logarithmic, and base conversion operations. + </p> + + <note> + In Impala, exponentiation uses the <codeph>pow()</codeph> function rather than an exponentiation operator + such as <codeph>**</codeph>. + </note> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + The mathematical functions operate mainly on these data types: <xref href="impala_int.xml#int"/>, + <xref href="impala_bigint.xml#bigint"/>, <xref href="impala_smallint.xml#smallint"/>, + <xref href="impala_tinyint.xml#tinyint"/>, <xref href="impala_double.xml#double"/>, + <xref href="impala_float.xml#float"/>, and <xref href="impala_decimal.xml#decimal"/>. For the operators that + perform the standard operations such as addition, subtraction, multiplication, and division, see + <xref href="impala_operators.xml#arithmetic_operators"/>. + </p> + + <p> + Functions that perform bitwise operations are explained in <xref href="impala_bit_functions.xml#bit_functions"/>. + </p> + + <p> + <b>Function reference:</b> + </p> + + <p> + Impala supports the following mathematical functions: + </p> + + <dl> + <dlentry rev="1.4.0" id="abs"> + + <dt rev="1.4.0 2.0.1"> + <codeph>abs(numeric_type a)</codeph> +<!-- <codeph>abs(double a), abs(decimal(p,s) a)</codeph> --> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">abs() function</indexterm> + <b>Purpose:</b> Returns the absolute value of the argument. + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> + <p> + <b>Usage notes:</b> Use this function to ensure all return values are positive. This is different than + the <codeph>positive()</codeph> function, which returns its argument unchanged (even if the argument + was negative). + </p> + </dd> + + </dlentry> + + <dlentry id="acos"> + + <dt> + <codeph>acos(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">acos() function</indexterm> + <b>Purpose:</b> Returns the arccosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="asin"> + + <dt> + <codeph>asin(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">asin() function</indexterm> + <b>Purpose:</b> Returns the arcsine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="atan"> + + <dt> + <codeph>atan(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">atan() function</indexterm> + <b>Purpose:</b> Returns the arctangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="atan2" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>atan2(double a, double b)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">atan2() function</indexterm> + <b>Purpose:</b> Returns the arctangent of the two arguments, with the signs of the arguments used to determine the + quadrant of the result. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="bin"> + + <dt> + <codeph>bin(bigint a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">bin() function</indexterm> + <b>Purpose:</b> Returns the binary representation of an integer value, that is, a string of 0 and 1 + digits. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="ceil"> + + <dt rev="1.4.0"> + <codeph>ceil(double a)</codeph>, + <codeph>ceil(decimal(p,s) a)</codeph>, + <codeph id="ceiling">ceiling(double a)</codeph>, + <codeph>ceiling(decimal(p,s) a)</codeph>, + <codeph id="dceil" rev="2.3.0">dceil(double a)</codeph>, + <codeph rev="2.3.0">dceil(decimal(p,s) a)</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">ceil() function</indexterm> + <b>Purpose:</b> Returns the smallest integer that is greater than or equal to the argument. + <p> + <b>Return type:</b> <codeph>int</codeph> or <codeph>decimal(p,s)</codeph> depending on the type of the + input argument + </p> + </dd> + + </dlentry> + + <dlentry id="conv"> + + <dt> + <codeph>conv(bigint num, int from_base, int to_base), conv(string num, int from_base, int + to_base)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">conv() function</indexterm> + <b>Purpose:</b> Returns a string representation of an integer value in a particular base. The input value + can be a string, for example to convert a hexadecimal number such as <codeph>fce2</codeph> to decimal. To + use the return value as a number (for example, when converting to base 10), use <codeph>CAST()</codeph> + to convert to the appropriate type. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cos"> + + <dt> + <codeph>cos(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">cos() function</indexterm> + <b>Purpose:</b> Returns the cosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cosh" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>cosh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">cosh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic cosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cot" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>cot(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">cot() function</indexterm> + <b>Purpose:</b> Returns the cotangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + </dd> + + </dlentry> + + <dlentry id="degrees"> + + <dt> + <codeph>degrees(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">degrees() function</indexterm> + <b>Purpose:</b> Converts argument value from radians to degrees. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="e"> + + <dt> + <codeph>e()</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">e() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="https://en.wikipedia.org/wiki/E_(mathematical_constant" scope="external" format="html">mathematical + constant e</xref>. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="exp"> + + <dt> + <codeph>exp(double a)</codeph>, + <codeph rev="2.3.0" id="dexp">dexp(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">exp() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="https://en.wikipedia.org/wiki/E_(mathematical_constant" scope="external" format="html">mathematical + constant e</xref> raised to the power of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="2.3.0" id="factorial"> + + <dt rev="2.3.0"> + <codeph>factorial(integer_type a)</codeph> + </dt> + <dd rev="2.3.0"> + <indexterm audience="Cloudera">factorial() function</indexterm> + <b>Purpose:</b> Computes the <xref href="https://en.wikipedia.org/wiki/Factorial" scope="external" format="html">factorial</xref> of an integer value. + It works with any integer type. + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p> + <b>Usage notes:</b> You can use either the <codeph>factorial()</codeph> function or the <codeph>!</codeph> operator. + The factorial of 0 is 1. Likewise, the <codeph>factorial()</codeph> function returns 1 for any negative value. + The maximum positive value for the input argument is 20; a value of 21 or greater overflows the + range for a <codeph>BIGINT</codeph> and causes an error. + </p> + <p> + <b>Return type:</b> <codeph>bigint</codeph> + </p> + <p conref="../shared/impala_common.xml#common/added_in_230"/> +<codeblock>select factorial(5); ++--------------+ +| factorial(5) | ++--------------+ +| 120 | ++--------------+ + +select 5!; ++-----+ +| 5! | ++-----+ +| 120 | ++-----+ + +select factorial(0); ++--------------+ +| factorial(0) | ++--------------+ +| 1 | ++--------------+ + +select factorial(-100); ++-----------------+ +| factorial(-100) | ++-----------------+ +| 1 | ++-----------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="floor"> + + <dt> + <codeph>floor(double a)</codeph>, + <codeph>floor(decimal(p,s) a)</codeph>, + <codeph rev="2.3.0" id="dfloor">dfloor(double a)</codeph>, + <codeph rev="2.3.0">dfloor(decimal(p,s) a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">floor() function</indexterm> + <b>Purpose:</b> Returns the largest integer that is less than or equal to the argument. + <p> + <b>Return type:</b> <codeph>bigint</codeph> or <codeph>decimal(p,s)</codeph> depending on the type of + the input argument + </p> + </dd> + + </dlentry> + + <dlentry id="fmod"> + + <dt> + <codeph>fmod(double a, double b), fmod(float a, float b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">fmod() function</indexterm> + <b>Purpose:</b> Returns the modulus of a floating-point number. Equivalent to the <codeph>%</codeph> arithmetic operator. + <p> + <b>Return type:</b> <codeph>float</codeph> or <codeph>double</codeph>, depending on type of arguments + </p> + <p conref="../shared/impala_common.xml#common/added_in_111"/> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Because this function operates on <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> + values, it is subject to potential rounding errors for values that cannot be + represented precisely. Prefer to use whole numbers, or values that you know + can be represented precisely by the <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> + types. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show equivalent operations with the <codeph>fmod()</codeph> + function and the <codeph>%</codeph> arithmetic operator, for values not subject + to any rounding error. + </p> +<codeblock>select fmod(10,3); ++-------------+ +| fmod(10, 3) | ++-------------+ +| 1 | ++-------------+ + +select fmod(5.5,2); ++--------------+ +| fmod(5.5, 2) | ++--------------+ +| 1.5 | ++--------------+ + +select 10 % 3; ++--------+ +| 10 % 3 | ++--------+ +| 1 | ++--------+ + +select 5.5 % 2; ++---------+ +| 5.5 % 2 | ++---------+ +| 1.5 | ++---------+ +</codeblock> + <p> + The following examples show operations with the <codeph>fmod()</codeph> + function for values that cannot be represented precisely by the + <codeph>DOUBLE</codeph> or <codeph>FLOAT</codeph> types, and thus are + subject to rounding error. <codeph>fmod(9.9,3.0)</codeph> returns a value + slightly different than the expected 0.9 because of rounding. + <codeph>fmod(9.9,3.3)</codeph> returns a value quite different from + the expected value of 0 because of rounding error during intermediate + calculations. + </p> +<codeblock>select fmod(9.9,3.0); ++--------------------+ +| fmod(9.9, 3.0) | ++--------------------+ +| 0.8999996185302734 | ++--------------------+ + +select fmod(9.9,3.3); ++-------------------+ +| fmod(9.9, 3.3) | ++-------------------+ +| 3.299999713897705 | ++-------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.2.2" id="fnv_hash"> + + <dt rev="1.2.2"> + <codeph>fnv_hash(type v)</codeph>, + </dt> + + <dd rev="1.2.2"> + <indexterm audience="Cloudera">fnv_hash() function</indexterm> + <b>Purpose:</b> Returns a consistent 64-bit value derived from the input argument, for convenience of + implementing hashing logic in an application. + <p> + <b>Return type:</b> <codeph>BIGINT</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + You might use the return value in an application where you perform load balancing, bucketing, or some + other technique to divide processing or storage. + </p> + <p> + Because the result can be any 64-bit value, to restrict the value to a particular range, you can use an + expression that includes the <codeph>ABS()</codeph> function and the <codeph>%</codeph> (modulo) + operator. For example, to produce a hash value in the range 0-9, you could use the expression + <codeph>ABS(FNV_HASH(x)) % 10</codeph>. + </p> + <p> + This function implements the same algorithm that Impala uses internally for hashing, on systems where + the CRC32 instructions are not available. + </p> + <p> + This function implements the + <xref href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function" scope="external" format="html">FowlerâNollâVo + hash function</xref>, in particular the FNV-1a variation. This is not a perfect hash function: some + combinations of values could produce the same result value. It is not suitable for cryptographic use. + </p> + <p> + Similar input values of different types could produce different hash values, for example the same + numeric value represented as <codeph>SMALLINT</codeph> or <codeph>BIGINT</codeph>, + <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>, or <codeph>DECIMAL(5,2)</codeph> or + <codeph>DECIMAL(20,5)</codeph>. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>[localhost:21000] > create table h (x int, s string); +[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism'); +[localhost:21000] > select x, fnv_hash(x) from h; ++------------+----------------------+ +| x | fnv_hash(x) | ++------------+----------------------+ +| 0 | -2611523532599129963 | +| 1 | 4307505193096137732 | +| 1234567890 | 3614724209955230832 | ++------------+----------------------+ +[localhost:21000] > select s, fnv_hash(s) from h; ++------------------------------+---------------------+ +| s | fnv_hash(s) | ++------------------------------+---------------------+ +| hello | 6414202926103426347 | +| world | 6535280128821139475 | +| antidisestablishmentarianism | -209330013948433970 | ++------------------------------+---------------------+ +[localhost:21000] > select s, abs(fnv_hash(s)) % 10 from h; ++------------------------------+-------------------------+ +| s | abs(fnv_hash(s)) % 10.0 | ++------------------------------+-------------------------+ +| hello | 8 | +| world | 6 | +| antidisestablishmentarianism | 4 | ++------------------------------+-------------------------+</codeblock> + <p> + For short argument values, the high-order bits of the result have relatively low entropy: + </p> +<codeblock>[localhost:21000] > create table b (x boolean); +[localhost:21000] > insert into b values (true), (true), (false), (false); +[localhost:21000] > select x, fnv_hash(x) from b; ++-------+---------------------+ +| x | fnv_hash(x) | ++-------+---------------------+ +| true | 2062020650953872396 | +| true | 2062020650953872396 | +| false | 2062021750465500607 | +| false | 2062021750465500607 | ++-------+---------------------+</codeblock> + <p> + <b>Added in:</b> Impala 1.2.2 + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="greatest"> + + <dt rev="1.4.0"> + <codeph>greatest(bigint a[, bigint b ...])</codeph>, <codeph>greatest(double a[, double b ...])</codeph>, + <codeph>greatest(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>greatest(string a[, string b + ...])</codeph>, <codeph>greatest(timestamp a[, timestamp b ...])</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">greatest() function</indexterm> + <b>Purpose:</b> Returns the largest value from a list of expressions. + <p conref="../shared/impala_common.xml#common/return_same_type"/> + </dd> + + </dlentry> + + <dlentry id="hex"> + + <dt> + <codeph>hex(bigint a), hex(string a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">hex() function</indexterm> + <b>Purpose:</b> Returns the hexadecimal representation of an integer value, or of the characters in a + string. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="is_inf"> + + <dt rev="1.4.0"> + <codeph>is_inf(double a)</codeph>, + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">is_inf() function</indexterm> + <b>Purpose:</b> Tests whether a value is equal to the special value <q>inf</q>, signifying infinity. + <p> + <b>Return type:</b> <codeph>boolean</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p conref="../shared/impala_common.xml#common/infinity_and_nan"/> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="is_nan"> + + <dt rev="1.4.0"> + <codeph>is_nan(double a)</codeph>, + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">is_nan() function</indexterm> + <b>Purpose:</b> Tests whether a value is equal to the special value <q>NaN</q>, signifying <q>not a + number</q>. + <p> + <b>Return type:</b> <codeph>boolean</codeph> + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p conref="../shared/impala_common.xml#common/infinity_and_nan"/> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="least"> + + <dt rev="1.4.0"> + <codeph>least(bigint a[, bigint b ...])</codeph>, <codeph>least(double a[, double b ...])</codeph>, + <codeph>least(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>least(string a[, string b + ...])</codeph>, <codeph>least(timestamp a[, timestamp b ...])</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">least() function</indexterm> + <b>Purpose:</b> Returns the smallest value from a list of expressions. + <p conref="../shared/impala_common.xml#common/return_same_type"/> + </dd> + + </dlentry> + + <dlentry id="ln"> + + <dt> + <codeph>ln(double a)</codeph>, + <codeph rev="2.3.0" id="dlog1">dlog1(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">ln() function</indexterm> + <indexterm audience="Cloudera">dlog1() function</indexterm> + <b>Purpose:</b> Returns the + <xref href="https://en.wikipedia.org/wiki/Natural_logarithm" scope="external" format="html">natural + logarithm</xref> of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log"> + + <dt> + <codeph>log(double base, double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the second argument to the specified base. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log10"> + + <dt> + <codeph>log10(double a)</codeph>, + <codeph rev="2.3.0" id="dlog10">dlog10(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log10() function</indexterm> + <indexterm audience="Cloudera">dlog10() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the argument to the base 10. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="log2"> + + <dt> + <codeph>log2(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">log2() function</indexterm> + <b>Purpose:</b> Returns the logarithm of the argument to the base 2. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="max_int"> + + <dt rev="1.4.0"> + <codeph>max_int(), <ph id="max_tinyint">max_tinyint()</ph>, <ph id="max_smallint">max_smallint()</ph>, + <ph id="max_bigint">max_bigint()</ph></codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">max_int() function</indexterm> + <indexterm audience="Cloudera">max_tinyint() function</indexterm> + <indexterm audience="Cloudera">max_smallint() function</indexterm> + <indexterm audience="Cloudera">max_bigint() function</indexterm> + <b>Purpose:</b> Returns the largest value of the associated integral type. + <p> + <b>Return type:</b> The same as the integral type being checked. + </p> + <p> +<!-- Repeated usage text between max_ and min_ functions, could turn into a conref. --> + <b>Usage notes:</b> Use the corresponding <codeph>min_</codeph> and <codeph>max_</codeph> functions to + check if all values in a column are within the allowed range, before copying data or altering column + definitions. If not, switch to the next higher integral type or to a <codeph>DECIMAL</codeph> with + sufficient precision. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="min_int"> + + <dt rev="1.4.0"> + <codeph>min_int(), <ph id="min_tinyint">min_tinyint()</ph>, <ph id="min_smallint">min_smallint()</ph>, + <ph id="min_bigint">min_bigint()</ph></codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">min_int() function</indexterm> + <indexterm audience="Cloudera">min_tinyint() function</indexterm> + <indexterm audience="Cloudera">min_smallint() function</indexterm> + <indexterm audience="Cloudera">min_bigint() function</indexterm> + <b>Purpose:</b> Returns the smallest value of the associated integral type (a negative number). + <p> + <b>Return type:</b> The same as the integral type being checked. + </p> + <p> + <b>Usage notes:</b> Use the corresponding <codeph>min_</codeph> and <codeph>max_</codeph> functions to + check if all values in a column are within the allowed range, before copying data or altering column + definitions. If not, switch to the next higher integral type or to a <codeph>DECIMAL</codeph> with + sufficient precision. + </p> + </dd> + + </dlentry> + + <dlentry id="mod" rev="2.2.0"> + + <dt rev="2.2.0"> + <codeph>mod(<varname>numeric_type</varname> a, <varname>same_type</varname> b)</codeph> + </dt> + + <dd rev="2.2.0"> + <indexterm audience="Cloudera">mod() function</indexterm> + <b>Purpose:</b> Returns the modulus of a number. Equivalent to the <codeph>%</codeph> arithmetic operator. + Works with any size integer type, any size floating-point type, and <codeph>DECIMAL</codeph> + with any precision and scale. + <p conref="../shared/impala_common.xml#common/return_type_same"/> + <p conref="../shared/impala_common.xml#common/added_in_220"/> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Because this function works with <codeph>DECIMAL</codeph> values, prefer it over <codeph>fmod()</codeph> + when working with fractional values. It is not subject to the rounding errors that make + <codeph>fmod()</codeph> problematic with floating-point numbers. + The <codeph>%</codeph> arithmetic operator now uses the <codeph>mod()</codeph> function + in cases where its arguments can be interpreted as <codeph>DECIMAL</codeph> values, + increasing the accuracy of that operator. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how the <codeph>mod()</codeph> function works for + whole numbers and fractional values, and how the <codeph>%</codeph> operator + works the same way. In the case of <codeph>mod(9.9,3)</codeph>, + the type conversion for the second argument results in the first argument + being interpreted as <codeph>DOUBLE</codeph>, so to produce an accurate + <codeph>DECIMAL</codeph> result requires casting the second argument + or writing it as a <codeph>DECIMAL</codeph> literal, 3.0. + </p> +<codeblock>select mod(10,3); ++-------------+ +| fmod(10, 3) | ++-------------+ +| 1 | ++-------------+ + +select mod(5.5,2); ++--------------+ +| fmod(5.5, 2) | ++--------------+ +| 1.5 | ++--------------+ + +select 10 % 3; ++--------+ +| 10 % 3 | ++--------+ +| 1 | ++--------+ + +select 5.5 % 2; ++---------+ +| 5.5 % 2 | ++---------+ +| 1.5 | ++---------+ + +select mod(9.9,3.3); ++---------------+ +| mod(9.9, 3.3) | ++---------------+ +| 0.0 | ++---------------+ + +select mod(9.9,3); ++--------------------+ +| mod(9.9, 3) | ++--------------------+ +| 0.8999996185302734 | ++--------------------+ + +select mod(9.9, cast(3 as decimal(2,1))); ++-----------------------------------+ +| mod(9.9, cast(3 as decimal(2,1))) | ++-----------------------------------+ +| 0.9 | ++-----------------------------------+ + +select mod(9.9,3.0); ++---------------+ +| mod(9.9, 3.0) | ++---------------+ +| 0.9 | ++---------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="negative"> + + <dt rev="2.0.1"> + <codeph>negative(numeric_type a)</codeph> +<!-- <codeph>negative(int a), negative(double a), negative(decimal(p,s) a)</codeph> --> + </dt> + + <dd> + <indexterm audience="Cloudera">negative() function</indexterm> + <b>Purpose:</b> Returns the argument with the sign reversed; returns a positive value if the argument was + already negative. + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> +<!-- + <p> + <b>Return type:</b> <codeph>int</codeph>, <codeph>double</codeph>, + or <codeph>decimal(p,s)</codeph> depending on type of argument + </p> + --> + <p> + <b>Usage notes:</b> Use <codeph>-abs(a)</codeph> instead if you need to ensure all return values are + negative. + </p> + </dd> + + </dlentry> + + <dlentry id="pi"> + + <dt rev="1.4.0"> + <codeph>pi()</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">pi() function</indexterm> + <b>Purpose:</b> Returns the constant pi. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="pmod"> + + <dt> + <codeph>pmod(bigint a, bigint b), pmod(double a, double b)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">pmod() function</indexterm> + <b>Purpose:</b> Returns the positive modulus of a number. + Primarily for <xref href="https://issues.apache.org/jira/browse/HIVE-656" scope="external" format="html">HiveQL compatibility</xref>. + <p> + <b>Return type:</b> <codeph>int</codeph> or <codeph>double</codeph>, depending on type of arguments + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how the <codeph>fmod()</codeph> function sometimes returns a negative value + depending on the sign of its arguments, and the <codeph>pmod()</codeph> function returns the same value + as <codeph>fmod()</codeph>, but sometimes with the sign flipped. + </p> +<codeblock>select fmod(-5,2); ++-------------+ +| fmod(-5, 2) | ++-------------+ +| -1 | ++-------------+ + +select pmod(-5,2); ++-------------+ +| pmod(-5, 2) | ++-------------+ +| 1 | ++-------------+ + +select fmod(-5,-2); ++--------------+ +| fmod(-5, -2) | ++--------------+ +| -1 | ++--------------+ + +select pmod(-5,-2); ++--------------+ +| pmod(-5, -2) | ++--------------+ +| -1 | ++--------------+ + +select fmod(5,-2); ++-------------+ +| fmod(5, -2) | ++-------------+ +| 1 | ++-------------+ + +select pmod(5,-2); ++-------------+ +| pmod(5, -2) | ++-------------+ +| -1 | ++-------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="positive"> + + <dt rev="2.0.1"> + <codeph>positive(numeric_type a)</codeph> +<!-- <codeph>positive(int a), positive(double a), positive(decimal(p,s) a</codeph> --> + </dt> + + <dd> + <indexterm audience="Cloudera">positive() function</indexterm> + <b>Purpose:</b> Returns the original argument unchanged (even if the argument is negative). + <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> +<!-- + <p> + <b>Return type:</b> <codeph>int</codeph>, <codeph>double</codeph>, + or <codeph>decimal(p,s)</codeph> depending on type of argument + </p> + --> + <p> + <b>Usage notes:</b> Use <codeph>abs()</codeph> instead if you need to ensure all return values are + positive. + </p> + </dd> + + </dlentry> + + <dlentry id="pow"> + + <dt rev="1.4.0"> + <codeph>pow(double a, double p)</codeph>, + <codeph id="power">power(double a, double p)</codeph>, + <codeph rev="2.3.0" id="dpow">dpow(double a, double p)</codeph>, + <codeph rev="2.3.0" id="fpow">fpow(double a, double p)</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">pow() function</indexterm> + <indexterm audience="Cloudera">power() function</indexterm> + <indexterm audience="Cloudera">dpow() function</indexterm> + <indexterm audience="Cloudera">fpow() function</indexterm> + <b>Purpose:</b> Returns the first argument raised to the power of the second argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="precision"> + + <dt rev="1.4.0"> + <codeph>precision(<varname>numeric_expression</varname>)</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">precision() function</indexterm> + <b>Purpose:</b> Computes the precision (number of decimal digits) needed to represent the type of the + argument expression as a <codeph>DECIMAL</codeph> value. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Typically used in combination with the <codeph>scale()</codeph> function, to determine the appropriate + <codeph>DECIMAL(<varname>precision</varname>,<varname>scale</varname>)</codeph> type to declare in a + <codeph>CREATE TABLE</codeph> statement or <codeph>CAST()</codeph> function. + </p> + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p conref="../shared/impala_common.xml#common/precision_scale_example"/> + </dd> + + </dlentry> + + <dlentry id="quotient"> + + <dt> + <codeph>quotient(int numerator, int denominator)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">quotient() function</indexterm> + <b>Purpose:</b> Returns the first argument divided by the second argument, discarding any fractional + part. Avoids promoting arguments to <codeph>DOUBLE</codeph> as happens with the <codeph>/</codeph> SQL + operator. + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="radians"> + + <dt> + <codeph>radians(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">radians() function</indexterm> + <b>Purpose:</b> Converts argument value from degrees to radians. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="rand"> + + <dt> + <codeph>rand()</codeph>, <codeph>rand(int seed)</codeph>, + <codeph rev="2.3.0" id="random">random()</codeph>, + <codeph rev="2.3.0">random(int seed)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">rand() function</indexterm> + <b>Purpose:</b> Returns a random value between 0 and 1. After <codeph>rand()</codeph> is called with a + seed argument, it produces a consistent random sequence based on the seed value. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + <p> + <b>Usage notes:</b> Currently, the random sequence is reset after each query, and multiple calls to + <codeph>rand()</codeph> within the same query return the same value each time. For different number + sequences that are different for each query, pass a unique seed value to each call to + <codeph>rand()</codeph>. For example, <codeph>select rand(unix_timestamp()) from ...</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p> + The following examples show how <codeph>rand()</codeph> can produce sequences of varying predictability, + so that you can reproduce query results involving random values or generate unique sequences of random + values for each query. + When <codeph>rand()</codeph> is called with no argument, it generates the same sequence of values each time, + regardless of the ordering of the result set. + When <codeph>rand()</codeph> is called with a constant integer, it generates a different sequence of values, + but still always the same sequence for the same seed value. + If you pass in a seed value that changes, such as the return value of the expression <codeph>unix_timestamp(now())</codeph>, + each query will use a different sequence of random values, potentially more useful in probability calculations although + more difficult to reproduce at a later time. Therefore, the final two examples with an unpredictable seed value + also include the seed in the result set, to make it possible to reproduce the same random sequence later. + </p> +<codeblock>select x, rand() from three_rows; ++---+-----------------------+ +| x | rand() | ++---+-----------------------+ +| 1 | 0.0004714746030380365 | +| 2 | 0.5895895192351144 | +| 3 | 0.4431900859080209 | ++---+-----------------------+ + +select x, rand() from three_rows order by x desc; ++---+-----------------------+ +| x | rand() | ++---+-----------------------+ +| 3 | 0.0004714746030380365 | +| 2 | 0.5895895192351144 | +| 1 | 0.4431900859080209 | ++---+-----------------------+ + +select x, rand(1234) from three_rows order by x; ++---+----------------------+ +| x | rand(1234) | ++---+----------------------+ +| 1 | 0.7377511392057646 | +| 2 | 0.009428468537250751 | +| 3 | 0.208117277924026 | ++---+----------------------+ + +select x, rand(1234) from three_rows order by x desc; ++---+----------------------+ +| x | rand(1234) | ++---+----------------------+ +| 3 | 0.7377511392057646 | +| 2 | 0.009428468537250751 | +| 1 | 0.208117277924026 | ++---+----------------------+ + +select x, unix_timestamp(now()), rand(unix_timestamp(now())) + from three_rows order by x; ++---+-----------------------+-----------------------------+ +| x | unix_timestamp(now()) | rand(unix_timestamp(now())) | ++---+-----------------------+-----------------------------+ +| 1 | 1440777752 | 0.002051228658320023 | +| 2 | 1440777752 | 0.5098743483004506 | +| 3 | 1440777752 | 0.9517714925817081 | ++---+-----------------------+-----------------------------+ + +select x, unix_timestamp(now()), rand(unix_timestamp(now())) + from three_rows order by x desc; ++---+-----------------------+-----------------------------+ +| x | unix_timestamp(now()) | rand(unix_timestamp(now())) | ++---+-----------------------+-----------------------------+ +| 3 | 1440777761 | 0.9985985015512437 | +| 2 | 1440777761 | 0.3251255333074953 | +| 1 | 1440777761 | 0.02422675025846192 | ++---+-----------------------+-----------------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="round"> + + <dt> + <codeph>round(double a)</codeph>, + <codeph>round(double a, int d)</codeph>, + <codeph rev="1.4.0">round(decimal a, int_type d)</codeph>, + <codeph rev="2.3.0" id="dround">dround(double a)</codeph>, + <codeph rev="2.3.0">dround(double a, int d)</codeph>, + <codeph rev="2.3.0">dround(decimal(p,s) a, int_type d)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">round() function</indexterm> + <indexterm audience="Cloudera">dround() function</indexterm> + <b>Purpose:</b> Rounds a floating-point value. By default (with a single argument), rounds to the nearest + integer. Values ending in .5 are rounded up for positive numbers, down for negative numbers (that is, + away from zero). The optional second argument specifies how many digits to leave after the decimal point; + values greater than zero produce a floating-point return value rounded to the requested number of digits + to the right of the decimal point. + <p rev="1.4.0"> + <b>Return type:</b> <codeph>bigint</codeph> for single <codeph>double</codeph> argument. + <codeph>double</codeph> for two-argument signature when second argument greater than zero. + For <codeph>DECIMAL</codeph> values, the smallest + <codeph>DECIMAL(<varname>p</varname>,<varname>s</varname>)</codeph> type with appropriate precision and + scale. + </p> + </dd> + + </dlentry> + + <dlentry rev="1.4.0" id="scale"> + + <dt rev="1.4.0"> + <codeph>scale(<varname>numeric_expression</varname>)</codeph> + </dt> + + <dd rev="1.4.0"> + <indexterm audience="Cloudera">scale() function</indexterm> + <b>Purpose:</b> Computes the scale (number of decimal digits to the right of the decimal point) needed to + represent the type of the argument expression as a <codeph>DECIMAL</codeph> value. + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + <p> + Typically used in combination with the <codeph>precision()</codeph> function, to determine the + appropriate <codeph>DECIMAL(<varname>precision</varname>,<varname>scale</varname>)</codeph> type to + declare in a <codeph>CREATE TABLE</codeph> statement or <codeph>CAST()</codeph> function. + </p> + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> + <p conref="../shared/impala_common.xml#common/precision_scale_example"/> + </dd> + + </dlentry> + + <dlentry id="sign"> + + <dt> + <codeph>sign(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sign() function</indexterm> + <b>Purpose:</b> Returns -1, 0, or 1 to indicate the signedness of the argument value. + <p> + <b>Return type:</b> <codeph>int</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="sin"> + + <dt> + <codeph>sin(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sin() function</indexterm> + <b>Purpose:</b> Returns the sine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="sinh" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>sinh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">sinh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic sine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="sqrt"> + + <dt> + <codeph>sqrt(double a)</codeph>, + <codeph rev="2.3.0" id="dsqrt">dsqrt(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">sqrt() function</indexterm> + <indexterm audience="Cloudera">dsqrt() function</indexterm> + <b>Purpose:</b> Returns the square root of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="tan"> + + <dt> + <codeph>tan(double a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">tan() function</indexterm> + <b>Purpose:</b> Returns the tangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="tanh" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>tanh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">tanh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic tangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry rev="2.3.0" id="truncate"> + + <dt rev="2.3.0"> + <codeph>truncate(double_or_decimal a[, digits_to_leave])</codeph>, + <ph id="dtrunc"><codeph>dtrunc(double_or_decimal a[, digits_to_leave])</codeph></ph> + </dt> + + <dd rev="2.3.0"> + <indexterm audience="Cloudera">truncate() function</indexterm> + <indexterm audience="Cloudera">dtrunc() function</indexterm> + <b>Purpose:</b> Removes some or all fractional digits from a numeric value. + With no argument, removes all fractional digits, leaving an integer value. + The optional argument specifies the number of fractional digits to include + in the return value, and only applies with the argument type is <codeph>DECIMAL</codeph>. + <codeph>truncate()</codeph> and <codeph>dtrunc()</codeph> are aliases for the same function. + <p> + <b>Return type:</b> <codeph>decimal</codeph> for <codeph>DECIMAL</codeph> arguments; + <codeph>bigint</codeph> for <codeph>DOUBLE</codeph> arguments + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> +<codeblock>select truncate(3.45) ++----------------+ +| truncate(3.45) | ++----------------+ +| 3 | ++----------------+ + +select truncate(-3.45) ++-----------------+ +| truncate(-3.45) | ++-----------------+ +| -3 | ++-----------------+ + +select truncate(3.456,1) ++--------------------+ +| truncate(3.456, 1) | ++--------------------+ +| 3.4 | ++--------------------+ + +select dtrunc(3.456,1) ++------------------+ +| dtrunc(3.456, 1) | ++------------------+ +| 3.4 | ++------------------+ + +select truncate(3.456,2) ++--------------------+ +| truncate(3.456, 2) | ++--------------------+ +| 3.45 | ++--------------------+ + +select truncate(3.456,7) ++--------------------+ +| truncate(3.456, 7) | ++--------------------+ +| 3.4560000 | ++--------------------+ +</codeblock> + </dd> + + </dlentry> + + <dlentry id="unhex"> + + <dt> + <codeph>unhex(string a)</codeph> + </dt> + + <dd> + <indexterm audience="Cloudera">unhex() function</indexterm> + <b>Purpose:</b> Returns a string of characters with ASCII values corresponding to pairs of hexadecimal + digits in the argument. + <p> + <b>Return type:</b> <codeph>string</codeph> + </p> + </dd> + + </dlentry> + </dl> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_max.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_max.xml b/docs/topics/impala_max.xml new file mode 100644 index 0000000..3f7b827 --- /dev/null +++ b/docs/topics/impala_max.xml @@ -0,0 +1,198 @@ +<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="max"> + + <title>MAX Function</title> + <titlealts audience="PDF"><navtitle>MAX</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Impala Functions"/> + <data name="Category" value="Analytic Functions"/> + <data name="Category" value="Aggregate Functions"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">max() function</indexterm> + An aggregate function that returns the maximum value from a set of numbers. Opposite of the + <codeph>MIN</codeph> function. Its single argument can be numeric column, or the numeric result of a function + or expression applied to the column value. Rows with a <codeph>NULL</codeph> value for the specified column + are ignored. If the table is empty, or all the values supplied to <codeph>MAX</codeph> are + <codeph>NULL</codeph>, <codeph>MAX</codeph> returns <codeph>NULL</codeph>. + </p> + + <p conref="../shared/impala_common.xml#common/syntax_blurb"/> + +<codeblock>MAX([DISTINCT | ALL] <varname>expression</varname>) [OVER (<varname>analytic_clause</varname>)]</codeblock> + + <p> + When the query contains a <codeph>GROUP BY</codeph> clause, returns one value for each combination of + grouping values. + </p> + + <p conref="../shared/impala_common.xml#common/restrictions_sliding_window"/> + + <p conref="../shared/impala_common.xml#common/return_type_same_except_string"/> + + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/partition_key_optimization"/> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> + + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> + + <p conref="../shared/impala_common.xml#common/example_blurb"/> + +<codeblock>-- Find the largest value for this column in the table. +select max(c1) from t1; +-- Find the largest value for this column from a subset of the table. +select max(c1) from t1 where month = 'January' and year = '2013'; +-- Find the largest value from a set of numeric function results. +select max(length(s)) from t1; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Return more than one result. +select month, year, max(purchase_price) from store_stats group by month, year; +-- Filter the input to eliminate duplicates before performing the calculation. +select max(distinct x) from t1; +</codeblock> + + <p rev="2.0.0"> + The following examples show how to use <codeph>MAX()</codeph> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <codeph>MAX()</codeph> is reported for each input value, as + opposed to the <codeph>GROUP BY</codeph> clause which condenses the result set. +<codeblock>select x, property, max(x) over (partition by property) as max from int_t where property in ('odd','even'); ++----+----------+-----+ +| x | property | max | ++----+----------+-----+ +| 2 | even | 10 | +| 4 | even | 10 | +| 6 | even | 10 | +| 8 | even | 10 | +| 10 | even | 10 | +| 1 | odd | 9 | +| 3 | odd | 9 | +| 5 | odd | 9 | +| 7 | odd | 9 | +| 9 | odd | 9 | ++----+----------+-----+ +</codeblock> + +Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <q>window</q>). The following examples use <codeph>MAX()</codeph> in an analytic context +(that is, with an <codeph>OVER()</codeph> clause) to display the smallest value of <codeph>X</codeph> +encountered up to each row in the result set. The examples use two columns in the <codeph>ORDER BY</codeph> +clause to produce a sequence of values that rises and falls, to illustrate how the <codeph>MAX()</codeph> +result only increases or stays the same throughout each partition within the result set. +The basic <codeph>ORDER BY x</codeph> clause implicitly +activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>, +therefore all of these examples produce the same results: + +<codeblock>select x, property, + max(x) <b>over (order by property, x desc)</b> as 'maximum to this point' +from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | maximum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 7 | +| 3 | prime | 7 | +| 2 | prime | 7 | +| 9 | square | 9 | +| 4 | square | 9 | +| 1 | square | 9 | ++---+----------+-----------------------+ + +select x, property, + max(x) over + ( + <b>order by property, x desc</b> + <b>rows between unbounded preceding and current row</b> + ) as 'maximum to this point' +from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | maximum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 7 | +| 3 | prime | 7 | +| 2 | prime | 7 | +| 9 | square | 9 | +| 4 | square | 9 | +| 1 | square | 9 | ++---+----------+-----------------------+ + +select x, property, + max(x) over + ( + <b>order by property, x desc</b> + <b>range between unbounded preceding and current row</b> + ) as 'maximum to this point' +from int_t where property in ('prime','square'); ++---+----------+-----------------------+ +| x | property | maximum to this point | ++---+----------+-----------------------+ +| 7 | prime | 7 | +| 5 | prime | 7 | +| 3 | prime | 7 | +| 2 | prime | 7 | +| 9 | square | 9 | +| 4 | square | 9 | +| 1 | square | 9 | ++---+----------+-----------------------+ +</codeblock> + +The following examples show how to construct a moving window, with a running maximum taking into account all rows before +and 1 row after the current row. +Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this type of +moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but not the <codeph>RANGE BETWEEN</codeph> clause. +Because of an extra Impala restriction on the <codeph>MAX()</codeph> and <codeph>MIN()</codeph> functions in an +analytic context, the lower bound must be <codeph>UNBOUNDED PRECEDING</codeph>. +<codeblock>select x, property, + max(x) over + ( + <b>order by property, x</b> + <b>rows between unbounded preceding and 1 following</b> + ) as 'local maximum' +from int_t where property in ('prime','square'); ++---+----------+---------------+ +| x | property | local maximum | ++---+----------+---------------+ +| 2 | prime | 3 | +| 3 | prime | 5 | +| 5 | prime | 7 | +| 7 | prime | 7 | +| 1 | square | 7 | +| 4 | square | 9 | +| 9 | square | 9 | ++---+----------+---------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + max(x) over + ( + <b>order by property, x</b> + <b>range between unbounded preceding and 1 following</b> + ) as 'local maximum' +from int_t where property in ('prime','square'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</codeblock> + </p> + + <p conref="../shared/impala_common.xml#common/related_info"/> + + <p> + <xref href="impala_analytic_functions.xml#analytic_functions"/>, <xref href="impala_min.xml#min"/>, + <xref href="impala_avg.xml#avg"/> + </p> + </conbody> +</concept>
