http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_jdbc.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_jdbc.xml b/docs/topics/impala_jdbc.xml index 8a7a955..ef5e9db 100644 --- a/docs/topics/impala_jdbc.xml +++ b/docs/topics/impala_jdbc.xml @@ -2,7 +2,18 @@ <concept id="impala_jdbc"> <title id="jdbc">Configuring Impala to Work with JDBC</title> - + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="JDBC"/> + <data name="Category" value="Java"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Configuring"/> + <data name="Category" value="Starting and Stopping"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> <conbody> @@ -14,8 +25,366 @@ with various database products. </p> - + <p> + Setting up a JDBC connection to Impala involves the following steps: + </p> + + <ul> + <li> + Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC + requests. + </li> + + <li> + Installing the JDBC driver on every system that runs the JDBC-enabled application. + </li> + + <li> + Specifying a connection string for the JDBC application to access one of the servers running the + <cmdname>impalad</cmdname> daemon, with the appropriate security settings. + </li> + </ul> + + <p outputclass="toc inpage"/> + </conbody> + + <concept id="jdbc_port"> + + <title>Configuring the JDBC Port</title> + + <conbody> + + <p> + The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC + connections through this same port 21050 by default. Make sure this port is available for communication + with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC + client software connects to a different port, specify that alternative port number with the + <codeph>--hs2_port</codeph> option when starting <codeph>impalad</codeph>. See + <xref href="impala_processes.xml#processes"/> for details about Impala startup options. See + <xref href="impala_ports.xml#ports"/> for information about all ports used for communication between Impala + and clients or between Impala components. + </p> + </conbody> + </concept> + + <concept id="jdbc_driver_choice"> + + <title>Choosing the JDBC Driver</title> + <prolog> + <metadata> + <data name="Category" value="Planning"/> + </metadata> + </prolog> + + <conbody> + + <p> + In Impala 2.0 and later, you have the choice between the Cloudera JDBC Connector and the Hive 0.13 JDBC driver. + Cloudera recommends using the Cloudera JDBC Connector where practical. + </p> + + <p> + If you are already using JDBC applications with an earlier Impala release, you must update your JDBC driver + to one of these choices, because the Hive 0.12 driver that was formerly the only choice is not compatible + with Impala 2.0 and later. + </p> + + <p> + Both the Cloudera JDBC 2.5 Connector and the Hive JDBC driver provide a substantial speed increase for JDBC + applications with Impala 2.0 and higher, for queries that return large result sets. + </p> + + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types"/> + <p conref="../shared/impala_common.xml#common/jdbc_odbc_complex_types_views"/> + </conbody> </concept> + <concept id="jdbc_setup"> + + <title>Enabling Impala JDBC Support on Client Systems</title> + <prolog> + <metadata> + <data name="Category" value="Installing"/> + </metadata> + </prolog> + + <conbody> + + <section id="install_jdbc_connector"> + <title>Using the Cloudera JDBC Connector (recommended)</title> + + <p> + You download and install the Cloudera JDBC 2.5 connector on any Linux, Windows, or Mac system where you + intend to run JDBC-enabled applications. From the + <xref href="http://go.cloudera.com/odbc-driver-hive-impala.html" scope="external" format="html">Cloudera + Connectors download page</xref>, you choose the appropriate protocol (JDBC or ODBC) and target product + (Impala or Hive). The ease of downloading and installing on non-CDH systems makes this connector a + convenient choice for organizations with heterogeneous environments. + </p> + + </section> + + <section id="install_hive_driver"> + <title>Using the Hive JDBC Driver</title> + <p> + You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> package) through the Linux package manager, on + hosts within the CDH cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive. + </p> + + <p> + To get the JAR files, install the Hive JDBC driver on each CDH-enabled host in the cluster that will run + JDBC applications. Follow the instructions for + <xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_jdbc_install.html" scope="external" format="html">CDH + 5</xref> or + <xref href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_Installing_hive_JDBC.html" scope="external" format="html">CDH + 4</xref>. + </p> + + <note> + The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for + Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13 + driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider + upgrading to the latest Hive JDBC driver for best performance with JDBC applications. + </note> + + <p> + If you are using JDBC-enabled applications on hosts outside the CDH cluster, you cannot use the CDH install + procedure on the non-CDH hosts. Install the JDBC driver on at least one CDH host using the preceding + procedure. Then download the JAR files to each client machine that will use JDBC with Impala: + </p> + + <codeblock>commons-logging-X.X.X.jar + hadoop-common.jar + hive-common-X.XX.X-cdhX.X.X.jar + hive-jdbc-X.XX.X-cdhX.X.X.jar + hive-metastore-X.XX.X-cdhX.X.X.jar + hive-service-X.XX.X-cdhX.X.X.jar + httpclient-X.X.X.jar + httpcore-X.X.X.jar + libfb303-X.X.X.jar + libthrift-X.X.X.jar + log4j-X.X.XX.jar + slf4j-api-X.X.X.jar + slf4j-logXjXX-X.X.X.jar + </codeblock> + + <p> + <b>To enable JDBC support for Impala on the system where you run the JDBC application:</b> + </p> + + <ol> + <li> + Download the JAR files listed above to each client machine. + <!-- + Download the + <xref href="https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip" scope="external" format="zip">Impala + JDBC zip file</xref> to the client machine that you will use to connect to Impala servers. + --> + <note> + For Maven users, see + <xref href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example" scope="external" format="html">this + sample github page</xref> for an example of the dependencies you could add to a <codeph>pom</codeph> + file instead of downloading the individual JARs. + </note> + </li> + + <li> + Store the JAR files in a location of your choosing, ideally a directory already referenced in your + <codeph>CLASSPATH</codeph> setting. For example: + <ul> + <li> + On Linux, you might use a location such as + <codeph>/</codeph><codeph>opt</codeph><codeph>/jars/</codeph>. + </li> + + <li> + On Windows, you might use a subdirectory underneath <filepath>C:\Program Files</filepath>. + </li> + </ul> + </li> + + <li> + To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR + files. This often means setting the <codeph>CLASSPATH</codeph> for the client process to include the + JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers, + but some examples of how to set <codeph>CLASSPATH</codeph> variables include: + <ul> + <li> + On Linux, if you extracted the JARs to <codeph>/opt/jars/</codeph>, you might issue the following + command to prepend the JAR files path to an existing classpath: + <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock> + </li> + + <li> + On Windows, use the <b>System Properties</b> control panel item to modify the <b>Environment + Variables</b> for your system. Modify the environment variables to include the path to which you + extracted the files. + <note> + If the existing <codeph>CLASSPATH</codeph> on your client machine refers to some older version of + the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files + earlier in the listings, or delete the other references to Hive JAR files. + </note> + </li> + </ul> + </li> + </ol> + </section> + + </conbody> + </concept> + + <concept id="jdbc_connect"> + + <title>Establishing JDBC Connections</title> + + <conbody> + + <p> + The JDBC driver class depends on which driver you select. + </p> + + <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/> + + <section id="class_jdbc_connector"> + + <title>Using the Cloudera JDBC Connector (recommended)</title> + + <p> + Depending on the level of the JDBC API your application is targeting, you can use + the following fully-qualified class names (FQCNs): + </p> + + <ul> + <li><codeph>com.cloudera.impala.jdbc41.Driver</codeph></li> + <li><codeph>com.cloudera.impala.jdbc41.DataSource</codeph></li> + </ul> + + <ul> + <li><codeph>com.cloudera.impala.jdbc4.Driver</codeph></li> + <li><codeph>com.cloudera.impala.jdbc4.DataSource</codeph></li> + </ul> + + <ul> + <li><codeph>com.cloudera.impala.jdbc3.Driver</codeph></li> + <li><codeph>com.cloudera.impala.jdbc3.DataSource</codeph></li> + </ul> + + <p> + The connection string has the following format: + </p> + +<codeblock>jdbc:impala://<varname>Host</varname>:<varname>Port</varname>[/<varname>Schema</varname>];<varname>Property1</varname>=<varname>Value</varname>;<varname>Property2</varname>=<varname>Value</varname>;...</codeblock> + + <p> + The <codeph>port</codeph> value is typically 21050 for Impala. + </p> + + <p> + For full details about the classes and the connection string (especially the property values available + for the connection string), download the appropriate driver documentation for your platform from + <xref href="http://www.cloudera.com/content/cloudera/en/downloads/connectors/impala/jdbc/impala-jdbc-v2-5-5.html" scope="external" format="html">the Impala JDBC Connector download page</xref>. + </p> + + </section> + + <section id="class_hive_driver"> + <title>Using the Hive JDBC Driver</title> + + <p> + For example, with the Hive JDBC driver, the class name is <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. + Once you have configured Impala to work with JDBC, you can establish connections between the two. + To do so for a cluster that does not use + Kerberos authentication, use a connection string of the form + <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>. +<!-- + Include the <codeph>auth=noSasl</codeph> argument + only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument. +--> + For example, you might use: + </p> + +<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock> + + <p> + To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the + form + <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>. + The principal must be the same user principal you used when starting Impala. For example, you might use: + </p> + +<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/[email protected]</codeblock> + + <p> + To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form + <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>. + For example, you might use: + </p> + +<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock> + + <note> + <p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/> + </note> + + </section> + + </conbody> + </concept> + + <concept rev="2.3.0" id="jdbc_odbc_notes"> + <title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title> + <conbody> + <p> + Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname> interpreter + of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between + the interactive shell and applications using the APIs: + </p> + <ul> + <li> + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> + <ul> + <li> + <p> + Queries involving the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) + require notation that might not be available in all levels of JDBC and ODBC drivers. + If you have trouble querying such a table due to the driver level or + inability to edit the queries used by the application, you can create a view that exposes + a <q>flattened</q> version of the complex columns and point the application at the view. + See <xref href="impala_complex_types.xml#complex_types"/> for details. + </p> + </li> + <li> + <p> + The complex types available in CDH 5.5 / Impala 2.3 and higher are supported by the + JDBC <codeph>getColumns()</codeph> API. + Both <codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type <codeph>ARRAY</codeph>, + because this is the closest matching Java SQL type. This behavior is consistent with Hive. + <codeph>STRUCT</codeph> types are reported as the JDBC SQL Type <codeph>STRUCT</codeph>. + </p> + <p> + To be consistent with Hive's behavior, the TYPE_NAME field is populated + with the primitive type name for scalar types, and with the full <codeph>toSql()</codeph> + for complex types. The resulting type names are somewhat inconsistent, + because nested types are printed differently than top-level types. For example, + the following list shows how <codeph>toSQL()</codeph> for Impala types are + translated to <codeph>TYPE_NAME</codeph> values: +<codeblock><![CDATA[DECIMAL(10,10) becomes DECIMAL +CHAR(10) becomes CHAR +VARCHAR(10) becomes VARCHAR +ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)> +ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)> +ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)> +]]> +</codeblock> + </p> + </li> + </ul> + </li> + </ul> + </conbody> + </concept> +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_joins.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_joins.xml b/docs/topics/impala_joins.xml index 011a488..0e807e8 100644 --- a/docs/topics/impala_joins.xml +++ b/docs/topics/impala_joins.xml @@ -3,7 +3,7 @@ <concept id="joins"> <title>Joins in Impala SELECT Statements</title> - <titlealts><navtitle>Joins</navtitle></titlealts> + <titlealts audience="PDF"><navtitle>Joins</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -473,6 +473,20 @@ Returned 1 row(s) in 1.00s</codeblock> <xref href="impala_hints.xml#hints"/>. </p> + <p rev="2.5.0"> + <b>Handling NULLs in Join Columns:</b> + </p> + + <p rev="2.5.0"> + By default, join key columns do not match if either one contains a <codeph>NULL</codeph> value. + To treat such columns as equal if both contain <codeph>NULL</codeph>, you can use an expression + such as <codeph>A = B OR (A IS NULL AND B IS NULL)</codeph>. + In CDH 5.7 / Impala 2.5 and higher, the <codeph><=></codeph> operator (shorthand for + <codeph>IS NOT DISTINCT FROM</codeph>) performs the same comparison in a concise and efficient form. + The <codeph><=></codeph> operator is more efficient in for comparing join keys in a <codeph>NULL</codeph>-safe + manner, because the operator can use a hash join while the <codeph>OR</codeph> expression cannot. + </p> + <p conref="../shared/impala_common.xml#common/example_blurb"/> <p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_kudu.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml index 5b8e87c..c530cc1 100644 --- a/docs/topics/impala_kudu.xml +++ b/docs/topics/impala_kudu.xml @@ -4,7 +4,16 @@ <title>Using Impala to Query Kudu Tables</title> - + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Kudu"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> + <conbody> <p> @@ -16,9 +25,143 @@ workloads (with key-based lookups for single rows or small ranges of values). </p> - + <p> + Certain Impala SQL statements, such as <codeph>UPDATE</codeph> and <codeph>DELETE</codeph>, only work with + Kudu tables. These operations were impractical from a performance perspective to perform at large scale on + HDFS data, or on HBase tables. + </p> + + </conbody> + + <concept id="kudu_benefits"> + + <title>Benefits of Using Kudu Tables with Impala</title> + + <conbody> + + <p> + The combination of Kudu and Impala works best for tables where scan performance is important, but data + arrives continuously, in small batches, or needs to be updated without being completely replaced. In these + scenarios (such as for streaming data), it might be impractical to use Parquet tables because Parquet works + best with multi-megabyte data files, requiring substantial overhead to replace or reorganize data files to + accomodate frequent additions or changes to data. Impala can query Kudu tables with scan performance close + to that of Parquet, and Impala can also perform update or delete operations without replacing the entire + table contents. You can also use the Kudu API to do ingestion or transformation operations outside of + Impala, and Impala can query the current data at any time. + </p> + </conbody> </concept> + <concept id="kudu_primary_key"> + + <title>Primary Key Columns for Kudu Tables</title> + + <conbody> + + <p> + Kudu tables introduce the notion of primary keys to Impala for the first time. The primary key is made up + of one or more columns, whose values are combined and used as a lookup key during queries. These columns + cannot contain any <codeph>NULL</codeph> values or any duplicate values, and can never be updated. For a + partitioned Kudu table, all the partition key columns must come from the set of primary key columns. + </p> + + <p> + Impala itself still does not have the notion of unique or non-<codeph>NULL</codeph> constraints. These + restrictions on the primary key columns are enforced on the Kudu side. + </p> + + <p> + The primary key columns must be the first ones specified in the <codeph>CREATE TABLE</codeph> statement. + You specify which column or columns make up the primary key in the table properties, rather than through + attributes in the column list. + </p> + + <p> + Kudu can do extra optimizations for queries that refer to the primary key columns in the + <codeph>WHERE</codeph> clause. It is not crucial though to include the primary key columns in the + <codeph>WHERE</codeph> clause of every query. The benefit is mainly for partitioned tables, + which divide the data among various tablet servers based on the distribution of + data values in some or all of the primary key columns. + </p> + + </conbody> + + </concept> + + <concept id="kudu_dml"> + + <title>Impala DML Support for Kudu Tables</title> + + <conbody> + + <p> + Impala supports certain DML statements for Kudu tables only. The <codeph>UPDATE</codeph> and + <codeph>DELETE</codeph> statements let you modify data within Kudu tables without rewriting substantial + amounts of table data. + </p> + + <p> + The <codeph>INSERT</codeph> statement for Kudu tables honors the unique and non-<codeph>NULL</codeph> + requirements for the primary key columns. + </p> + + <p> + Because Impala and Kudu do not support transactions, the effects of any <codeph>INSERT</codeph>, + <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement are immediately visible. For example, you + cannot do a sequence of <codeph>UPDATE</codeph> statements and only make the change visible after all the + statements are finished. Also, if a DML statement fails partway through, any rows that were already + inserted, deleted, or changed remain in the table; there is no rollback mechanism to undo the changes. + </p> + + </conbody> + + </concept> + + <concept id="kudu_partitioning"> + + <title>Partitioning for Kudu Tables</title> + + <conbody> + + <p> + Kudu tables use special mechanisms to evenly distribute data among the underlying tablet servers. Although + we refer to such tables as partitioned tables, they are distinguished from traditional Impala partitioned + tables by use of different clauses on the <codeph>CREATE TABLE</codeph> statement. Partitioned Kudu tables + use <codeph>DISTRIBUTE BY</codeph>, <codeph>HASH</codeph>, <codeph>RANGE</codeph>, and <codeph>SPLIT + ROWS</codeph> clauses rather than the traditional <codeph>PARTITIONED BY</codeph> clause. All of the + columns involved in these clauses must be primary key columns. These clauses let you specify different ways + to divide the data for each column, or even for different value ranges within a column. This flexibility + lets you avoid problems with uneven distribution of data, where the partitioning scheme for HDFS tables + might result in some partitions being much larger than others. By setting up an effective partitioning + scheme for a Kudu table, you can ensure that the work for a query can be parallelized evenly across the + hosts in a cluster. + </p> + + </conbody> + + </concept> + + <concept id="kudu_performance"> + + <title>Impala Query Performance for Kudu Tables</title> + + <conbody> + + <p> + For queries involving Kudu tables, Impala can delegate much of the work of filtering the result set to + Kudu, avoiding some of the I/O involved in full table scans of tables containing HDFS data files. This type + of optimization is especially effective for partitioned Kudu tables, where the Impala query + <codeph>WHERE</codeph> clause refers to one or more primary key columns that are also used as partition key + columns. For example, if a partitioned Kudu table uses a <codeph>HASH</codeph> clause for + <codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for <codeph>col2</codeph>, a query using a clause + such as <codeph>WHERE col1 IN (1,2,3) AND col2 > 100</codeph> can determine exactly which tablet servers + contain relevant data, and therefore parallelize the query very efficiently. + </p> + + </conbody> + + </concept> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_langref.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref.xml b/docs/topics/impala_langref.xml index aaa76aa..f81b76f 100644 --- a/docs/topics/impala_langref.xml +++ b/docs/topics/impala_langref.xml @@ -2,8 +2,8 @@ <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept id="langref"> - <title><ph audience="PDF">Impala SQL Language Reference</ph><ph audience="HTML">Overview of Impala SQL</ph></title> - + <title>Impala SQL Language Reference</title> + <titlealts audience="PDF"><navtitle>SQL Reference</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -17,163 +17,58 @@ <conbody> <p> - Impala uses SQL as its query language. Impala interprets SQL statements and performs the - full end-to-end processing for each statement. (As opposed to acting as a translation - layer for some other Hadoop subsystem.) + Impala uses SQL as its query language. To protect user investment in skills development and query + design, Impala provides a high degree of compatibility with the Hive Query Language (HiveQL): + </p> + + <ul> + <li> + Because Impala uses the same metadata store as Hive to record information about table structure and + properties, Impala can access tables defined through the native Impala <codeph>CREATE TABLE</codeph> + command, or tables created using the Hive data definition language (DDL). + </li> + + <li> + Impala supports data manipulation (DML) statements similar to the DML component of HiveQL. + </li> + + <li> + Impala provides many <xref href="impala_functions.xml#builtins">built-in functions</xref> with the same + names and parameter types as their HiveQL equivalents. + </li> + </ul> + + <p> + Impala supports most of the same <xref href="impala_langref_sql.xml#langref_sql">statements and + clauses</xref> as HiveQL, including, but not limited to <codeph>JOIN</codeph>, <codeph>AGGREGATE</codeph>, + <codeph>DISTINCT</codeph>, <codeph>UNION ALL</codeph>, <codeph>ORDER BY</codeph>, <codeph>LIMIT</codeph> and + (uncorrelated) subquery in the <codeph>FROM</codeph> clause. Impala also supports <codeph>INSERT + INTO</codeph> and <codeph>INSERT OVERWRITE</codeph>. + </p> + + <p> + Impala supports data types with the same names and semantics as the equivalent Hive data types: + <codeph>STRING</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, + <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>, <codeph>DOUBLE</codeph>, <codeph>BOOLEAN</codeph>, + <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>. </p> <p> - Impala implements many familiar statements, such as <codeph>CREATE TABLE</codeph>, - <codeph>INSERT</codeph>, and <codeph>SELECT</codeph>. Currently, the DML statements - <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> are not available in the production - level of Impala, because big data analytics with Hadoop and HDFS typically involves - unchanging data. <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> <i>are</i> available - in beta form in the version of Impala used with the Kudu storage layer. For full details - about Impala SQL syntax and semantics, see + For full details about Impala SQL syntax and semantics, see <xref href="impala_langref_sql.xml#langref_sql"/>. </p> <p> - Queries include clauses such as <codeph>WHERE</codeph>, <codeph>GROUP BY</codeph>, - <codeph>ORDER BY</codeph>, and <codeph>JOIN</codeph>. For information about query syntax, - see <xref href="impala_select.xml#select"/>. + Most HiveQL <codeph>SELECT</codeph> and <codeph>INSERT</codeph> statements run unmodified with Impala. For + information about Hive syntax not available in Impala, see + <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/>. </p> <p> - Queries can also include function calls, to scalar functions such as - <codeph>sin()</codeph> and <codeph>substr()</codeph>, aggregate functions such as - <codeph>count()</codeph> and <codeph>avg()</codeph>, and analytic functions such as - <codeph>lag()</codeph> and <codeph>rank()</codeph>. For a list of the built-in functions - available in Impala queries, see <xref href="impala_functions.xml#builtins"/>. + For a list of the built-in functions available in Impala queries, see + <xref href="impala_functions.xml#builtins"/>. </p> <p outputclass="toc"/> - </conbody> - - <concept id="langref_performance"> - - <title>Performance Features</title> - - <conbody> - - <p> - The main performance-related SQL features for Impala are: - </p> - - <ul> - <li> - <p> - The <codeph>COMPUTE STATS</codeph> statement, and the underlying table statistics - and column statistics used in query planning. The statistics are used to estimate - the number of rows and size of the result set for queries, subqueries, and the - different <q>sides</q> of a join query. - </p> - </li> - - <li> - <p> - The output of the <codeph>EXPLAIN</codeph> statement. It outlines the ways in which - the query is parallelized, and how much I/O, memory, and so on the query expects to - use. You can control the level of detail in the output through a query option. - </p> - </li> - - <li> - <p> - Partitioning for tables. By organizing the data for efficient access along one or - more dimensions, this technique lets queries read only the relevant data. - </p> - </li> - - <li> - <p> - Query hints, especially for join queries. Impala selects from different join - algorithms based on the relative sizes of the result sets for each side of the join. - In cases where you know the most effective technique for a particular query, you can - override the estimates that Impala uses to make that choice, and select the join - technique directly. - </p> - </li> - - <li> - <p> - Query options. These options control settings that can influence the performance of - individual queries when you know the special considerations based on your workload, - hardware configuration, or data distribution. - </p> - </li> - </ul> - - <p> - Because analytic queries against high volumes of data tend to require full scans against - large portions of data from each table, Impala does not include index-related SQL - statements such as <codeph>CREATE INDEX</codeph>. The <codeph>COMPUTE STATS</codeph> - serves the purpose of analyzing the distribution of data within each column and the - overall table. Partitioning optimizes the physical layout of the data for queries that - filter on one or more crucial columns. - </p> - - </conbody> - - </concept> - - <concept id="hive_interoperability"> - - <title>Sharing Tables, Data, and Queries Between Impala and Hive</title> - - <conbody> - - <p> - To protect user investment in skills development and query design, Impala provides a - high degree of compatibility with the Hive Query Language (HiveQL): - </p> - - <ul> - <li> - Because Impala uses the same metadata store as Hive to record information about table - structure and properties, Impala can access tables defined through the native Impala - <codeph>CREATE TABLE</codeph> command, or tables created using the Hive data - definition language (DDL). - </li> - - <li> - Impala supports data manipulation (DML) statements similar to the DML component of - HiveQL. - </li> - - <li> - Impala provides many <xref href="impala_functions.xml#builtins">built-in - functions</xref> with the same names and parameter types as their HiveQL equivalents. - </li> - </ul> - - <p> - Impala supports most of the same - <xref href="impala_langref_sql.xml#langref_sql">statements and clauses</xref> as HiveQL, - including, but not limited to <codeph>JOIN</codeph>, <codeph>AGGREGATE</codeph>, - <codeph>DISTINCT</codeph>, <codeph>UNION ALL</codeph>, <codeph>ORDER BY</codeph>, - <codeph>LIMIT</codeph> and (uncorrelated) subquery in the <codeph>FROM</codeph> clause. - Impala also supports <codeph>INSERT INTO</codeph> and <codeph>INSERT OVERWRITE</codeph>. - </p> - - <p> - Impala supports data types with the same names and semantics as the equivalent Hive data - types: <codeph>STRING</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, - <codeph>INT</codeph>, <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>, - <codeph>DOUBLE</codeph>, <codeph>BOOLEAN</codeph>, <codeph>STRING</codeph>, - <codeph>TIMESTAMP</codeph>. CDH 5.5 / Impala 2.3 and higher also include the complex - types <codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>. - </p> - - <p> - Most HiveQL <codeph>SELECT</codeph> and <codeph>INSERT</codeph> statements run - unmodified with Impala. For information about Hive syntax not available in Impala, see - <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/>. - </p> - - </conbody> - - </concept> - </concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_langref_sql.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref_sql.xml b/docs/topics/impala_langref_sql.xml index d759e76..18b6726 100644 --- a/docs/topics/impala_langref_sql.xml +++ b/docs/topics/impala_langref_sql.xml @@ -3,7 +3,7 @@ <concept id="langref_sql"> <title>Impala SQL Statements</title> - <titlealts><navtitle>SQL Statements</navtitle></titlealts> + <titlealts audience="PDF"><navtitle>SQL Statements</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_langref_unsupported.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref_unsupported.xml b/docs/topics/impala_langref_unsupported.xml index f2b0560..39043f3 100644 --- a/docs/topics/impala_langref_unsupported.xml +++ b/docs/topics/impala_langref_unsupported.xml @@ -43,12 +43,12 @@ from HiveQL: </p> - <draft-comment translate="no"> -Yeesh, too many separate lists of unsupported Hive syntax. -Here, the FAQ, and in some of the intro topics. -Some discussion in IMP-1061 about how best to reorg. -Lots of opportunities for conrefs. -</draft-comment> + <!-- To do: + Yeesh, too many separate lists of unsupported Hive syntax. + Here, the FAQ, and in some of the intro topics. + Some discussion in IMP-1061 about how best to reorg. + Lots of opportunities for conrefs. + --> <ul> <!-- Now supported in CDH 5.5 / Impala 2.3 and higher. Find places on this page (like already done under lateral views) to note the new data type support. @@ -61,6 +61,10 @@ Lots of opportunities for conrefs. Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom file formats, or custom SerDes. </li> + <li rev="CDH-41376"> + The <codeph>DATE</codeph> data type. + </li> + <li> XML and JSON functions. </li> @@ -96,16 +100,26 @@ Lots of opportunities for conrefs. for full details on Impala UDFs. <ul> <li> - Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs. + <p> + Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs. + </p> + </li> + + <li> + <p> + Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently + support user-defined table generating functions (UDTFs). + </p> </li> <li> - Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently - support user-defined table generating functions (UDTFs). + <p> + Only Impala-supported column types are supported in Java-based UDFs. + </p> </li> <li> - Only Impala-supported column types are supported in Java-based UDFs. + <p conref="../shared/impala_common.xml#common/current_user_caveat"/> </li> </ul> </p> @@ -146,6 +160,12 @@ Lots of opportunities for conrefs. <li> <codeph>SHOW COLUMNS</codeph> </li> + + <li rev="DOCS-656"> + <codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT OVERWRITE <varname>table_name</varname></codeph> + or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query results into the HDFS directory associated + with an Impala table. + </li> </ul> </conbody> </concept> @@ -167,7 +187,7 @@ Lots of opportunities for conrefs. <p> Impala utilizes the <xref href="http://sentry.incubator.apache.org/" scope="external" format="html">Apache - Sentry (incubating)</xref> authorization framework, which provides fine-grained role-based access control + Sentry </xref> authorization framework, which provides fine-grained role-based access control to protect data against unauthorized access or tampering. </p> @@ -265,13 +285,9 @@ Lots of opportunities for conrefs. </li> <li> - Impala does not return column overflows as <codeph>NULL</codeph>, so that customers can distinguish - between <codeph>NULL</codeph> data and overflow conditions similar to how they do so with traditional - database systems. Impala returns the largest or smallest value in the range for the type. For example, - valid values for a <codeph>tinyint</codeph> range from -128 to 127. In Impala, a <codeph>tinyint</codeph> - with a value of -200 returns -128 rather than <codeph>NULL</codeph>. A <codeph>tinyint</codeph> with a - value of 200 returns 127. + <p conref="../shared/impala_common.xml#common/int_overflow_behavior"/> </li> + </ul> <p> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_limit.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_limit.xml b/docs/topics/impala_limit.xml index c186cd4..ec12271 100644 --- a/docs/topics/impala_limit.xml +++ b/docs/topics/impala_limit.xml @@ -9,6 +9,8 @@ <data name="Category" value="SQL"/> <data name="Category" value="Querying"/> <data name="Category" value="Reports"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_literals.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_literals.xml b/docs/topics/impala_literals.xml index 3c53796..d84d84c 100644 --- a/docs/topics/impala_literals.xml +++ b/docs/topics/impala_literals.xml @@ -357,7 +357,7 @@ insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;</codeb <li rev="1.2.1"> <p conref="../shared/impala_common.xml#common/null_sorting_change"/> <note> - <draft-comment translate="no"> Probably a bunch of similar view-related restrictions like this that should be collected, reused, or cross-referenced under the Views topic. </draft-comment> + <!-- To do: Probably a bunch of similar view-related restrictions like this that should be collected, reused, or cross-referenced under the Views topic. --> Because the <codeph>NULLS FIRST</codeph> and <codeph>NULLS LAST</codeph> keywords are not currently available in Hive queries, any views you create using those keywords will not be available through Hive. http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_live_progress.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_live_progress.xml b/docs/topics/impala_live_progress.xml index f58cdcb..ef8e8c4 100644 --- a/docs/topics/impala_live_progress.xml +++ b/docs/topics/impala_live_progress.xml @@ -2,7 +2,8 @@ <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept rev="2.3.0" id="live_progress"> - <title>LIVE_PROGRESS Query Option</title> + <title>LIVE_PROGRESS Query Option (CDH 5.5 or higher only)</title> + <titlealts audience="PDF"><navtitle>LIVE_PROGRESS</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -11,12 +12,14 @@ <data name="Category" value="Performance"/> <data name="Category" value="Reports"/> <data name="Category" value="impala-shell"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> <conbody> - <p> + <p rev="2.3.0"> <indexterm audience="Cloudera">LIVE_PROGRESS query option</indexterm> For queries submitted through the <cmdname>impala-shell</cmdname> command, displays an interactive progress bar showing roughly what percentage of @@ -59,6 +62,8 @@ <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/> <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> <codeblock><![CDATA[[localhost:21000] > set live_progress=true; LIVE_PROGRESS set to true @@ -69,8 +74,8 @@ LIVE_PROGRESS set to true | 150000 | +----------+ [localhost:21000] > select count(*) from customer t1 cross join customer t2; -[################################################## ] 50% -[####################################################################################################] 100% +[################################### ] 50% +[######################################################################] 100% ]]> </codeblock> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_live_summary.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_live_summary.xml b/docs/topics/impala_live_summary.xml index bfe71bf..42fe484 100644 --- a/docs/topics/impala_live_summary.xml +++ b/docs/topics/impala_live_summary.xml @@ -2,7 +2,8 @@ <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> <concept rev="2.3.0" id="live_summary"> - <title>LIVE_SUMMARY Query Option</title> + <title>LIVE_SUMMARY Query Option (CDH 5.5 or higher only)</title> + <titlealts audience="PDF"><navtitle>LIVE_SUMMARY</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -11,12 +12,14 @@ <data name="Category" value="Performance"/> <data name="Category" value="Reports"/> <data name="Category" value="impala-shell"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> <conbody> - <p> + <p rev="2.3.0"> <indexterm audience="Cloudera">LIVE_SUMMARY query option</indexterm> For queries submitted through the <cmdname>impala-shell</cmdname> command, displays the same output as the <codeph>SUMMARY</codeph> command, @@ -67,6 +70,8 @@ <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_compute_stats_caveat"/> <p conref="../shared/impala_common.xml#common/impala_shell_progress_reports_shell_only_caveat"/> + <p conref="../shared/impala_common.xml#common/added_in_230"/> + <p conref="../shared/impala_common.xml#common/example_blurb"/> <p> @@ -197,7 +202,6 @@ Query: select count(*) from customer t1 cross join customer t2 [####################################################################################################] 100% +---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | -[localhost:21000] > ]]> </codeblock> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_load_data.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_load_data.xml b/docs/topics/impala_load_data.xml index e3517f0..e9d94b5 100644 --- a/docs/topics/impala_load_data.xml +++ b/docs/topics/impala_load_data.xml @@ -3,7 +3,7 @@ <concept rev="1.1" id="load_data"> <title>LOAD DATA Statement</title> - <titlealts><navtitle>LOAD DATA</navtitle></titlealts> + <titlealts audience="PDF"><navtitle>LOAD DATA</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -15,6 +15,7 @@ <data name="Category" value="Developers"/> <data name="Category" value="HDFS"/> <data name="Category" value="Tables"/> + <data name="Category" value="S3"/> </metadata> </prolog> @@ -74,6 +75,14 @@ directory. </li> + <li rev="2.5.0 IMPALA-2867"> + The operation fails if the source directory contains any non-hidden directories. + Prior to CDH 5.7 / Impala 2.5, if the source directory contained any subdirectory, even a hidden one such as + <filepath>_impala_insert_staging</filepath>, the <codeph>LOAD DATA</codeph> statement would fail. + In CDH 5.7 / Impala 2.5 and higher, <codeph>LOAD DATA</codeph> ignores hidden subdirectories in the + source directory, and only fails if any of the subdirectories are non-hidden. + </li> + <li> The loaded data files retain their original names in the new location, unless a name conflicts with an existing data file, in which case the name of the new file is modified slightly to be unique. (The @@ -209,6 +218,8 @@ Returned 1 row(s) in 0.62s</codeblock> <p conref="../shared/impala_common.xml#common/s3_blurb"/> <p conref="../shared/impala_common.xml#common/s3_dml"/> + <p conref="../shared/impala_common.xml#common/s3_dml_performance"/> + <p>See <xref href="../topics/impala_s3.xml#s3"/> for details about reading and writing S3 data with Impala.</p> <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/> @@ -223,7 +234,8 @@ Returned 1 row(s) in 0.62s</codeblock> <p conref="../shared/impala_common.xml#common/related_info"/> <p> The <codeph>LOAD DATA</codeph> statement is an alternative to the - <codeph>INSERT</codeph> statement. Use <codeph>LOAD DATA</codeph> + <codeph><xref href="impala_insert.xml#insert">INSERT</xref></codeph> statement. + Use <codeph>LOAD DATA</codeph> when you have the data files in HDFS but outside of any Impala table. </p> <p> @@ -231,7 +243,8 @@ Returned 1 row(s) in 0.62s</codeblock> to the <codeph>CREATE EXTERNAL TABLE</codeph> statement. Use <codeph>LOAD DATA</codeph> when it is appropriate to move the data files under Impala control rather than querying them - from their original location. + from their original location. See <xref href="impala_tables.xml#external_tables"/> + for information about working with external tables. </p> </conbody> </concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_logging.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_logging.xml b/docs/topics/impala_logging.xml index 9430178..0767818 100644 --- a/docs/topics/impala_logging.xml +++ b/docs/topics/impala_logging.xml @@ -4,7 +4,16 @@ <title>Using Impala Logging</title> <titlealts audience="PDF"><navtitle>Logging</navtitle></titlealts> - + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Logs"/> + <data name="Category" value="Troubleshooting"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> <conbody> @@ -12,10 +21,457 @@ The Impala logs record information about: </p> - + <ul> + <li> + Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and + troubleshoot that problem before you can do anything further with Impala. + </li> + + <li> + How Impala is configured. + </li> + + <li> + Jobs Impala has completed. + </li> + </ul> + + <note> + <p> + Formerly, the logs contained the query profile for each query, showing low-level details of how the work is + distributed among nodes and how intermediate and final results are transmitted across the network. To save + space, those query profiles are now stored in zlib-compressed files in + <filepath>/var/log/impala/profiles</filepath>. You can access them through the Impala web user interface. + For example, at <codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each query + is followed by a <codeph>Profile</codeph> link leading to a page showing extensive analytical data for the + query execution. + </p> + + <p rev="1.1.1"> + The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when + enabled. See <xref href="impala_auditing.xml#auditing"/> for details. + </p> + + <p rev="2.2.0"> + The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when + enabled. See <xref href="impala_lineage.xml#lineage"/> for details. + </p> + </note> + + <p outputclass="toc inpage"/> + + </conbody> + + <concept id="logs_details"> + + <title>Locations and Names of Impala Log Files</title> + + <conbody> + + <ul> + <li> + By default, the log files are under the directory <filepath>/var/log/impala</filepath>. +<!-- TK: split this task out and state CM and non-CM ways. --> + To change log file locations, modify the defaults file described in + <xref href="impala_processes.xml#processes"/>. + </li> + + <li> + The significant files for the <codeph>impalad</codeph> process are <filepath>impalad.INFO</filepath>, + <filepath>impalad.WARNING</filepath>, and <filepath>impalad.ERROR</filepath>. You might also see a file + <filepath>impalad.FATAL</filepath>, although this is only present in rare conditions. + </li> + + <li> + The significant files for the <codeph>statestored</codeph> process are + <filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and + <filepath>statestored.ERROR</filepath>. You might also see a file <filepath>statestored.FATAL</filepath>, + although this is only present in rare conditions. + </li> + + <li rev="1.2"> + The significant files for the <codeph>catalogd</codeph> process are <filepath>catalogd.INFO</filepath>, + <filepath>catalogd.WARNING</filepath>, and <filepath>catalogd.ERROR</filepath>. You might also see a file + <filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions. + </li> + + <li> + Examine the <codeph>.INFO</codeph> files to see configuration settings for the processes. + </li> + + <li> + Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information, including such + things as suboptimal settings and also serious runtime errors. + </li> + + <li> + Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only the most serious + errors, if the processes crash, or queries fail to complete. These messages are also in the + <codeph>.WARNING</codeph> file. + </li> + + <li> + A new set of log files is produced each time the associated daemon is restarted. These log files have + long names including a timestamp. The <codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and + <codeph>.ERROR</codeph> files are physically represented as symbolic links to the latest applicable log + files. + </li> + + <li> + The init script for the <codeph>impala-server</codeph> service also produces a consolidated log file + <codeph>/var/logs/impalad/impala-server.log</codeph>, with all the same information as the + corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files. + </li> + + <li> + The init script for the <codeph>impala-state-store</codeph> service also produces a consolidated log file + <codeph>/var/logs/impalad/impala-state-store.log</codeph>, with all the same information as the + corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files. + </li> + </ul> + + <p> + Impala stores information using the <codeph>glog_v</codeph> logging system. You will see some messages + referring to C++ file names. Logging is affected by: + </p> + + <ul> + <li> + The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are logged. See + <xref href="#log_levels"/> for details. + </li> + + <li> + The <codeph>-logbuflevel</codeph> startup flag for the <cmdname>impalad</cmdname> daemon specifies how + often the log information is written to disk. The default is 0, meaning that the log is immediately + flushed to disk when Impala outputs an important messages such as a warning or an error, but less + important messages such as informational ones are buffered in memory rather than being flushed to disk + immediately. + </li> + + <li> + Cloudera Manager has an Impala configuration setting that sets the <codeph>-logbuflevel</codeph> startup + option. + </li> + </ul> </conbody> </concept> + <concept id="logs_cm_noncm"> + + <title>Managing Impala Logs through Cloudera Manager or Manually</title> + <prolog> + <metadata> + <data name="Category" value="Administrators"/> + <data name="Category" value="Cloudera Manager"/> + </metadata> + </prolog> + + <conbody> + + <p> + Cloudera recommends installing Impala through the Cloudera Manager administration interface. To assist with + troubleshooting, Cloudera Manager collects front-end and back-end logs together into a single view, and let + you do a search across log data for all the managed nodes rather than examining the logs on each node + separately. If you installed Impala using Cloudera Manager, refer to the topics on Monitoring Services + (<xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_service_monitoring.html" scope="external" format="html">CDH + 5</xref>, + <xref href="http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Diagnostics-Guide/Cloudera-Manager-Diagnostics-Guide.html" scope="external" format="html">CDH + 4</xref>) or Logs + (<xref href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_logs.html" scope="external" format="html">CDH + 5</xref>, + <xref href="http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Diagnostics-Guide/cmdg_logs.html" scope="external" format="html">CDH + 4</xref>). + </p> + + <p> + If you are using Impala in an environment not managed by Cloudera Manager, review Impala log files on each + host, when you have traced an issue back to a specific system. + </p> + + </conbody> + + </concept> + + <concept id="logs_rotate"> + + <title>Rotating Impala Logs</title> + <prolog> + <metadata> + <data name="Category" value="Disk Storage"/> + </metadata> + </prolog> + + <conbody> + + <p> + Impala periodically switches the physical files representing the current log files, after which it is safe + to remove the old files if they are no longer needed. + </p> + + <p> + Impala can automatically remove older unneeded log files, a feature known as <term>log rotation</term>. +<!-- Another instance of the text also used in impala_new_features.xml + and impala_fixed_issues.xml. (Just took out the word "new" + and added the reference to the starting release.) + At this point, a conref is definitely in the cards. --> + </p> + + <p> + In Impala 2.2 and higher, the <codeph>-max_log_files</codeph> configuration option specifies how many log + files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon + (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and <cmdname>catalogd</cmdname>). The default + value is 10, meaning that Impala preserves the latest 10 log files for each severity level + (<codeph>INFO</codeph>, <codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and <codeph>FATAL</codeph>). + Impala checks to see if any old logs need to be removed based on the interval specified in the + <codeph>logbufsecs</codeph> setting, every 5 seconds by default. + </p> + +<!-- This extra detail only appears here. Consider if it's worth including it + in the conref so people don't need to follow a link just for a couple of + minor factoids. --> + + <p> + A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your + Linux tool or technique of choice. A value of 1 preserves only the very latest log file. + </p> + + <p> + To set up log rotation on a system managed by Cloudera Manager 5.4.0 and higher, search for the + <codeph>max_log_files</codeph> option name and set the appropriate value for the <userinput>Maximum Log + Files</userinput> field for each Impala configuration category (Impala, Catalog Server, and StateStore). + Then restart the Impala service. In earlier Cloudera Manager releases, specify the + <codeph>-max_log_files=<varname>maximum</varname></codeph> option in the <uicontrol>Command Line Argument + Advanced Configuration Snippet (Safety Valve)</uicontrol> field for each Impala configuration category. + </p> + + </conbody> + + </concept> + + <concept id="logs_debug"> + + <title>Reviewing Impala Logs</title> + + <conbody> + + <p> + By default, the Impala log is stored at <codeph>/var/logs/impalad/</codeph>. The most comprehensive log, + showing informational, warning, and error messages, is in the file name <filepath>impalad.INFO</filepath>. + View log file contents by using the web interface or by examining the contents of the log file. (When you + examine the logs through the file system, you can troubleshoot problems by reading the + <filepath>impalad.WARNING</filepath> and/or <filepath>impalad.ERROR</filepath> files, which contain the + subsets of messages indicating potential problems.) + </p> + + <p> + On a machine named <codeph>impala.example.com</codeph> with default settings, you could view the Impala + logs on that machine by using a browser to access <codeph>http://impala.example.com:25000/logs</codeph>. + </p> + + <note> + <p> + The web interface limits the amount of logging information displayed. To view every log entry, access the + log files directly through the file system. + </p> + </note> + + <p> + You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file system. With the + default configuration settings, the start of the log file appears as follows: + </p> + +<codeblock>[user@example impalad]$ pwd +/var/log/impalad +[user@example impalad]$ more impalad.INFO +Log file created at: 2013/01/07 08:42:12 +Running on machine: impala.example.com +Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg +I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff) +Built on Fri, 21 Dec 2012 12:55:19 PST +I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com +I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver): +--dump_ir=false +--module_output= +--be_port=22000 +--classpath= +--hostname=impala.example.com</codeblock> + + <note> + The preceding example shows only a small part of the log file. Impala log files are often several megabytes + in size. + </note> + + </conbody> + + </concept> + + <concept id="log_format"> + + <title>Understanding Impala Log Contents</title> + + <conbody> + + <p> + The logs store information about Impala startup options. This information appears once for each time Impala + is started and may include: + </p> + + <ul> + <li> + Machine name. + </li> + + <li> + Impala version number. + </li> + + <li> + Flags used to start Impala. + </li> + + <li> + CPU information. + </li> + + <li> + The number of available disks. + </li> + </ul> + + <p> + There is information about each job Impala has run. Because each Impala job creates an additional set of + data about queries, the amount of job specific data may be very large. Logs may contained detailed + information on jobs. These detailed log entries may include: + </p> + + <ul> + <li> + The composition of the query. + </li> + + <li> + The degree of data locality. + </li> + + <li> + Statistics on data throughput and response times. + </li> + </ul> + + </conbody> + + </concept> + + <concept id="log_levels"> + + <title>Setting Logging Levels</title> + + <conbody> + + <p> + Impala uses the GLOG system, which supports three logging levels. You can adjust the logging levels using + the Cloudera Manager Admin Console. You can adjust logging levels without going through the Cloudera + Manager Admin Console by exporting variable settings. To change logging settings manually, use a command + similar to the following on each node before starting <codeph>impalad</codeph>: + </p> + +<codeblock>export GLOG_v=1</codeblock> + + <note> + For performance reasons, Cloudera highly recommends not enabling the most verbose logging level of 3. + </note> + + <p> + For more information on how to configure GLOG, including how to set variable logging levels for different + system components, see + <xref href="http://google-glog.googlecode.com/svn/trunk/doc/glog.html" scope="external" format="html">How + To Use Google Logging Library (glog)</xref>. + </p> + + <section id="loglevels_details"> + + <title>Understanding What is Logged at Different Logging Levels</title> + + <p> + As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2 + records everything GLOG_v=1 records, as well as additional information. + </p> + + <p> + Increasing logging levels imposes performance overhead and increases log size. Cloudera recommends using + GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful + troubleshooting information. + </p> + + <p> + Additional information logged at each level is as follows: + </p> + + <ul> + <li> + GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an + <codeph>impalad</codeph> instance, including runtime profiles. + </li> + + <li> + GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also + records query execution progress information, including details on each file that is read. + </li> + + <li> + GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is + only applicable for the most serious troubleshooting and tuning scenarios, because it can produce + exceptionally large and detailed log files, potentially leading to its own set of performance and + capacity problems. + </li> + </ul> + + </section> + + </conbody> + + </concept> + + <concept id="redaction" rev="2.2.0"> + + <title>Redacting Sensitive Information from Impala Log Files</title> + <prolog> + <metadata> + <data name="Category" value="Redaction"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">redaction</indexterm> + <term>Log redaction</term> is a security feature that prevents sensitive information from being displayed in + locations used by administrators for monitoring and troubleshooting, such as log files, the Cloudera Manager + user interface, and the Impala debug web user interface. You configure regular expressions that match + sensitive types of information processed by your system, such as credit card numbers or tax IDs, and literals + matching these patterns are obfuscated wherever they would normally be recorded in log files or displayed in + administration or debugging user interfaces. + </p> + + <p> + In a security context, the log redaction feature is complementary to the Sentry authorization framework. + Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents + administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying + information (PII) that might appear in queries issued by those authorized users. + </p> + + <p> + See + <xref audience="integrated" href="sg_redaction.xml#log_redact"/><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html" scope="external" format="html"/> + for details about how to enable this feature and set + up the regular expressions to detect and redact sensitive information within SQL statement text. + </p> + + </conbody> + + </concept> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_map.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_map.xml b/docs/topics/impala_map.xml index 41e4754..64851e9 100644 --- a/docs/topics/impala_map.xml +++ b/docs/topics/impala_map.xml @@ -7,6 +7,9 @@ <metadata> <data name="Category" value="Impala"/> <data name="Category" value="Impala Data Types"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_math_functions.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_math_functions.xml b/docs/topics/impala_math_functions.xml index fd16b37..c82a29b 100644 --- a/docs/topics/impala_math_functions.xml +++ b/docs/topics/impala_math_functions.xml @@ -2,7 +2,7 @@ <concept id="math_functions"> <title>Impala Mathematical Functions</title> - <titlealts><navtitle>Mathematical Functions</navtitle></titlealts> + <titlealts audience="PDF"><navtitle>Mathematical Functions</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -53,12 +53,12 @@ <dl> <dlentry rev="1.4.0" id="abs"> - <dt rev="2.0.1"> + <dt rev="1.4.0 2.0.1"> <codeph>abs(numeric_type a)</codeph> <!-- <codeph>abs(double a), abs(decimal(p,s) a)</codeph> --> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">abs() function</indexterm> <b>Purpose:</b> Returns the absolute value of the argument. <p rev="2.0.1" conref="../shared/impala_common.xml#common/return_type_same"/> @@ -119,6 +119,23 @@ </dlentry> + <dlentry id="atan2" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>atan2(double a, double b)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">atan2() function</indexterm> + <b>Purpose:</b> Returns the arctangent of the two arguments, with the signs of the arguments used to determine the + quadrant of the result. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + <dlentry id="bin"> <dt> @@ -138,7 +155,7 @@ <dlentry rev="1.4.0" id="ceil"> - <dt> + <dt rev="1.4.0"> <codeph>ceil(double a)</codeph>, <codeph>ceil(decimal(p,s) a)</codeph>, <codeph id="ceiling">ceiling(double a)</codeph>, @@ -147,7 +164,7 @@ <codeph rev="2.3.0">dceil(decimal(p,s) a)</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">ceil() function</indexterm> <b>Purpose:</b> Returns the smallest integer that is greater than or equal to the argument. <p> @@ -194,13 +211,29 @@ </dlentry> - <dlentry id="cot" rev="2.3.0"> + <dlentry id="cosh" rev="2.3.0 IMPALA-1771"> - <dt> + <dt rev="2.3.0 IMPALA-1771"> + <codeph>cosh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">cosh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic cosine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + + <dlentry id="cot" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> <codeph>cot(double a)</codeph> </dt> - <dd> + <dd rev="2.3.0 IMPALA-1771"> <indexterm audience="Cloudera">cot() function</indexterm> <b>Purpose:</b> Returns the cotangent of the argument. <p> @@ -236,7 +269,7 @@ <dd> <indexterm audience="Cloudera">e() function</indexterm> <b>Purpose:</b> Returns the - <xref href="http://en.wikipedia.org/wiki/E_(mathematical_constant)" scope="external" format="html">mathematical + <xref href="https://en.wikipedia.org/wiki/E_(mathematical_constant" scope="external" format="html">mathematical constant e</xref>. <p> <b>Return type:</b> <codeph>double</codeph> @@ -255,7 +288,7 @@ <dd> <indexterm audience="Cloudera">exp() function</indexterm> <b>Purpose:</b> Returns the - <xref href="http://en.wikipedia.org/wiki/E_(mathematical_constant)" scope="external" format="html">mathematical + <xref href="https://en.wikipedia.org/wiki/E_(mathematical_constant" scope="external" format="html">mathematical constant e</xref> raised to the power of the argument. <p> <b>Return type:</b> <codeph>double</codeph> @@ -266,10 +299,10 @@ <dlentry rev="2.3.0" id="factorial"> - <dt> + <dt rev="2.3.0"> <codeph>factorial(integer_type a)</codeph> </dt> - <dd> + <dd rev="2.3.0"> <indexterm audience="Cloudera">factorial() function</indexterm> <b>Purpose:</b> Computes the <xref href="https://en.wikipedia.org/wiki/Factorial" scope="external" format="html">factorial</xref> of an integer value. It works with any integer type. @@ -421,11 +454,11 @@ select fmod(9.9,3.3); <dlentry rev="1.2.2" id="fnv_hash"> - <dt> + <dt rev="1.2.2"> <codeph>fnv_hash(type v)</codeph>, </dt> - <dd> + <dd rev="1.2.2"> <indexterm audience="Cloudera">fnv_hash() function</indexterm> <b>Purpose:</b> Returns a consistent 64-bit value derived from the input argument, for convenience of implementing hashing logic in an application. @@ -509,13 +542,13 @@ select fmod(9.9,3.3); <dlentry rev="1.4.0" id="greatest"> - <dt> + <dt rev="1.4.0"> <codeph>greatest(bigint a[, bigint b ...])</codeph>, <codeph>greatest(double a[, double b ...])</codeph>, <codeph>greatest(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>greatest(string a[, string b ...])</codeph>, <codeph>greatest(timestamp a[, timestamp b ...])</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">greatest() function</indexterm> <b>Purpose:</b> Returns the largest value from a list of expressions. <p conref="../shared/impala_common.xml#common/return_same_type"/> @@ -542,35 +575,29 @@ select fmod(9.9,3.3); <dlentry rev="1.4.0" id="is_inf"> - <dt> + <dt rev="1.4.0"> <codeph>is_inf(double a)</codeph>, </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">is_inf() function</indexterm> <b>Purpose:</b> Tests whether a value is equal to the special value <q>inf</q>, signifying infinity. <p> <b>Return type:</b> <codeph>boolean</codeph> </p> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> - <p> - Infinity and NaN can be specified in text data files as <codeph>inf</codeph> and <codeph>nan</codeph> - respectively, and Impala interprets them as these special values. They can also be produced by certain - arithmetic expressions; for example, <codeph>pow(-1, 0.5)</codeph> returns infinity and - <codeph>1/0</codeph> returns NaN. Or you can cast the literal values, such as <codeph>CAST('nan' AS - DOUBLE)</codeph> or <codeph>CAST('inf' AS DOUBLE)</codeph>. - </p> + <p conref="../shared/impala_common.xml#common/infinity_and_nan"/> </dd> </dlentry> <dlentry rev="1.4.0" id="is_nan"> - <dt> + <dt rev="1.4.0"> <codeph>is_nan(double a)</codeph>, </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">is_nan() function</indexterm> <b>Purpose:</b> Tests whether a value is equal to the special value <q>NaN</q>, signifying <q>not a number</q>. @@ -578,26 +605,20 @@ select fmod(9.9,3.3); <b>Return type:</b> <codeph>boolean</codeph> </p> <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> - <p> - Infinity and NaN can be specified in text data files as <codeph>inf</codeph> and <codeph>nan</codeph> - respectively, and Impala interprets them as these special values. They can also be produced by certain - arithmetic expressions; for example, <codeph>pow(-1, 0.5)</codeph> returns infinity and - <codeph>1/0</codeph> returns NaN. Or you can cast the literal values, such as <codeph>CAST('nan' AS - DOUBLE)</codeph> or <codeph>CAST('inf' AS DOUBLE)</codeph>. - </p> + <p conref="../shared/impala_common.xml#common/infinity_and_nan"/> </dd> </dlentry> <dlentry rev="1.4.0" id="least"> - <dt> + <dt rev="1.4.0"> <codeph>least(bigint a[, bigint b ...])</codeph>, <codeph>least(double a[, double b ...])</codeph>, <codeph>least(decimal(p,s) a[, decimal(p,s) b ...])</codeph>, <codeph>least(string a[, string b ...])</codeph>, <codeph>least(timestamp a[, timestamp b ...])</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">least() function</indexterm> <b>Purpose:</b> Returns the smallest value from a list of expressions. <p conref="../shared/impala_common.xml#common/return_same_type"/> @@ -677,12 +698,12 @@ select fmod(9.9,3.3); <dlentry rev="1.4.0" id="max_int"> - <dt> + <dt rev="1.4.0"> <codeph>max_int(), <ph id="max_tinyint">max_tinyint()</ph>, <ph id="max_smallint">max_smallint()</ph>, <ph id="max_bigint">max_bigint()</ph></codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">max_int() function</indexterm> <indexterm audience="Cloudera">max_tinyint() function</indexterm> <indexterm audience="Cloudera">max_smallint() function</indexterm> @@ -704,12 +725,12 @@ select fmod(9.9,3.3); <dlentry rev="1.4.0" id="min_int"> - <dt> + <dt rev="1.4.0"> <codeph>min_int(), <ph id="min_tinyint">min_tinyint()</ph>, <ph id="min_smallint">min_smallint()</ph>, <ph id="min_bigint">min_bigint()</ph></codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">min_int() function</indexterm> <indexterm audience="Cloudera">min_tinyint() function</indexterm> <indexterm audience="Cloudera">min_smallint() function</indexterm> @@ -730,11 +751,11 @@ select fmod(9.9,3.3); <dlentry id="mod" rev="2.2.0"> - <dt> + <dt rev="2.2.0"> <codeph>mod(<varname>numeric_type</varname> a, <varname>same_type</varname> b)</codeph> </dt> - <dd> + <dd rev="2.2.0"> <indexterm audience="Cloudera">mod() function</indexterm> <b>Purpose:</b> Returns the modulus of a number. Equivalent to the <codeph>%</codeph> arithmetic operator. Works with any size integer type, any size floating-point type, and <codeph>DECIMAL</codeph> @@ -848,11 +869,11 @@ select mod(9.9,3.0); <dlentry id="pi"> - <dt> + <dt rev="1.4.0"> <codeph>pi()</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">pi() function</indexterm> <b>Purpose:</b> Returns the constant pi. <p> @@ -954,14 +975,14 @@ select pmod(5,-2); <dlentry id="pow"> - <dt> + <dt rev="1.4.0"> <codeph>pow(double a, double p)</codeph>, <codeph id="power">power(double a, double p)</codeph>, <codeph rev="2.3.0" id="dpow">dpow(double a, double p)</codeph>, <codeph rev="2.3.0" id="fpow">fpow(double a, double p)</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">pow() function</indexterm> <indexterm audience="Cloudera">power() function</indexterm> <indexterm audience="Cloudera">dpow() function</indexterm> @@ -976,11 +997,11 @@ select pmod(5,-2); <dlentry rev="1.4.0" id="precision"> - <dt> + <dt rev="1.4.0"> <codeph>precision(<varname>numeric_expression</varname>)</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">precision() function</indexterm> <b>Purpose:</b> Computes the precision (number of decimal digits) needed to represent the type of the argument expression as a <codeph>DECIMAL</codeph> value. @@ -1160,11 +1181,11 @@ select x, unix_timestamp(now()), rand(unix_timestamp(now())) <dlentry rev="1.4.0" id="scale"> - <dt> + <dt rev="1.4.0"> <codeph>scale(<varname>numeric_expression</varname>)</codeph> </dt> - <dd> + <dd rev="1.4.0"> <indexterm audience="Cloudera">scale() function</indexterm> <b>Purpose:</b> Computes the scale (number of decimal digits to the right of the decimal point) needed to represent the type of the argument expression as a <codeph>DECIMAL</codeph> value. @@ -1215,6 +1236,22 @@ select x, unix_timestamp(now()), rand(unix_timestamp(now())) </dlentry> + <dlentry id="sinh" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>sinh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">sinh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic sine of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + <dlentry id="sqrt"> <dt> @@ -1249,14 +1286,30 @@ select x, unix_timestamp(now()), rand(unix_timestamp(now())) </dlentry> + <dlentry id="tanh" rev="2.3.0 IMPALA-1771"> + + <dt rev="2.3.0 IMPALA-1771"> + <codeph>tanh(double a)</codeph> + </dt> + + <dd rev="2.3.0 IMPALA-1771"> + <indexterm audience="Cloudera">tanh() function</indexterm> + <b>Purpose:</b> Returns the hyperbolic tangent of the argument. + <p> + <b>Return type:</b> <codeph>double</codeph> + </p> + </dd> + + </dlentry> + <dlentry rev="2.3.0" id="truncate"> - <dt> + <dt rev="2.3.0"> <codeph>truncate(double_or_decimal a[, digits_to_leave])</codeph>, <ph id="dtrunc"><codeph>dtrunc(double_or_decimal a[, digits_to_leave])</codeph></ph> </dt> - <dd> + <dd rev="2.3.0"> <indexterm audience="Cloudera">truncate() function</indexterm> <indexterm audience="Cloudera">dtrunc() function</indexterm> <b>Purpose:</b> Removes some or all fractional digits from a numeric value. http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_max.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_max.xml b/docs/topics/impala_max.xml index b989785..3f7b827 100644 --- a/docs/topics/impala_max.xml +++ b/docs/topics/impala_max.xml @@ -2,7 +2,7 @@ <concept id="max"> <title>MAX Function</title> - <titlealts><navtitle>MAX</navtitle></titlealts> + <titlealts audience="PDF"><navtitle>MAX</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> @@ -11,6 +11,8 @@ <data name="Category" value="Analytic Functions"/> <data name="Category" value="Aggregate Functions"/> <data name="Category" value="Querying"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog> @@ -38,10 +40,14 @@ <p conref="../shared/impala_common.xml#common/return_type_same_except_string"/> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> + + <p conref="../shared/impala_common.xml#common/partition_key_optimization"/> + <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> <p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/> - + <p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/> <p conref="../shared/impala_common.xml#common/example_blurb"/> @@ -111,7 +117,7 @@ select x, property, ( <b>order by property, x desc</b> <b>rows between unbounded preceding and current row</b> - ) as 'maximum to this point' + ) as 'maximum to this point' from int_t where property in ('prime','square'); +---+----------+-----------------------+ | x | property | maximum to this point | @@ -130,7 +136,7 @@ select x, property, ( <b>order by property, x desc</b> <b>range between unbounded preceding and current row</b> - ) as 'maximum to this point' + ) as 'maximum to this point' from int_t where property in ('prime','square'); +---+----------+-----------------------+ | x | property | maximum to this point | @@ -156,7 +162,7 @@ analytic context, the lower bound must be <codeph>UNBOUNDED PRECEDING</codeph>. ( <b>order by property, x</b> <b>rows between unbounded preceding and 1 following</b> - ) as 'local maximum' + ) as 'local maximum' from int_t where property in ('prime','square'); +---+----------+---------------+ | x | property | local maximum | http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3c2c8f12/docs/topics/impala_max_errors.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_max_errors.xml b/docs/topics/impala_max_errors.xml index 86f3618..c6eb971 100644 --- a/docs/topics/impala_max_errors.xml +++ b/docs/topics/impala_max_errors.xml @@ -3,12 +3,15 @@ <concept id="max_errors"> <title>MAX_ERRORS Query Option</title> + <titlealts audience="PDF"><navtitle>MAX_ERRORS</navtitle></titlealts> <prolog> <metadata> <data name="Category" value="Impala"/> <data name="Category" value="Impala Query Options"/> <data name="Category" value="Troubleshooting"/> <data name="Category" value="Logs"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> </metadata> </prolog>
