http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_guidelines.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_security_guidelines.html b/docs/build/html/topics/impala_security_guidelines.html new file mode 100644 index 0000000..4b1a738 --- /dev/null +++ b/docs/build/html/topics/impala_security_guidelines.html @@ -0,0 +1,99 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_guidelines"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Security Guidelines for Impala</title></head><body id="security_guidelines"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Security Guidelines for Impala</h1> + + + <div class="body conbody"> + + <p class="p"> + The following are the major steps to harden a cluster running Impala against accidents and mistakes, or + malicious attackers trying to access sensitive data: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Secure the <code class="ph codeph">root</code> account. The <code class="ph codeph">root</code> user can tamper with the + <span class="keyword cmdname">impalad</span> daemon, read and write the data files in HDFS, log into other user accounts, and + access other system services that are beyond the control of Impala. + </p> + </li> + + <li class="li"> + <p class="p"> + Restrict membership in the <code class="ph codeph">sudoers</code> list (in the <span class="ph filepath">/etc/sudoers</span> file). + The users who can run the <code class="ph codeph">sudo</code> command can do many of the same things as the + <code class="ph codeph">root</code> user. + </p> + </li> + + <li class="li"> + <p class="p"> + Ensure the Hadoop ownership and permissions for Impala data files are restricted. + </p> + </li> + + <li class="li"> + <p class="p"> + Ensure the Hadoop ownership and permissions for Impala log files are restricted. + </p> + </li> + + <li class="li"> + <p class="p"> + Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is + password-protected. See <a class="xref" href="impala_webui.html#webui">Impala Web User Interface for Debugging</a> for details. + </p> + </li> + + <li class="li"> + <p class="p"> + Create a policy file that specifies which Impala privileges are available to users in particular Hadoop + groups (which by default map to Linux OS groups). Create the associated Linux groups using the + <span class="keyword cmdname">groupadd</span> command if necessary. + </p> + </li> + + <li class="li"> + <p class="p"> + The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism; for + background information, see the + <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html" target="_blank">HDFS Permissions Guide</a>. + Set up users and assign them to groups at the OS level, corresponding to the + different categories of users with different access levels for various databases, tables, and HDFS + locations (URIs). Create the associated Linux users using the <span class="keyword cmdname">useradd</span> command if + necessary, and add them to the appropriate groups with the <span class="keyword cmdname">usermod</span> command. + </p> + </li> + + <li class="li"> + <p class="p"> + Design your databases, tables, and views with database and table structure to allow policy rules to specify + simple, consistent rules. For example, if all tables related to an application are inside a single + database, you can assign privileges for that database and use the <code class="ph codeph">*</code> wildcard for the table + name. If you are creating views with different privileges than the underlying base tables, you might put + the views in a separate database so that you can use the <code class="ph codeph">*</code> wildcard for the database + containing the base tables, while specifying the precise names of the individual views. (For specifying + table or database names, you either specify the exact name or <code class="ph codeph">*</code> to mean all the databases + on a server, or all the tables and views in a database.) + </p> + </li> + + <li class="li"> + <p class="p"> + Enable authorization by running the <code class="ph codeph">impalad</code> daemons with the <code class="ph codeph">-server_name</code> + and <code class="ph codeph">-authorization_policy_file</code> options on all nodes. (The authorization feature does not + apply to the <span class="keyword cmdname">statestored</span> daemon, which has no access to schema objects or data files.) + </p> + </li> + + <li class="li"> + <p class="p"> + Set up authentication using Kerberos, to make sure users really are who they say they are. + </p> + </li> + </ul> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_install.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_security_install.html b/docs/build/html/topics/impala_security_install.html new file mode 100644 index 0000000..f9724ef --- /dev/null +++ b/docs/build/html/topics/impala_security_install.html @@ -0,0 +1,17 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installation Considerations for Impala Security</title></head><body id="security_install"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Installation Considerations for Impala Security</h1> + + + <div class="body conbody"> + + <p class="p"> + Impala 1.1 comes set up with all the software and settings needed to enable security when you run the + <span class="keyword cmdname">impalad</span> daemon with the new security-related options (<code class="ph codeph">-server_name</code> and + <code class="ph codeph">-authorization_policy_file</code>). You do not need to change any environment variables or install + any additional JAR files. + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_metastore.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_security_metastore.html b/docs/build/html/topics/impala_security_metastore.html new file mode 100644 index 0000000..cc852ad --- /dev/null +++ b/docs/build/html/topics/impala_security_metastore.html @@ -0,0 +1,30 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_metastore"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Hive Metastore Database</title></head><body id="security_metastore"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Securing the Hive Metastore Database</h1> + + + <div class="body conbody"> + + + + <p class="p"> + It is important to secure the Hive metastore, so that users cannot access the names or other information + about databases and tables the through the Hive client or by querying the metastore database. Do this by + turning on Hive metastore security, using the instructions in + <span class="xref">the documentation for your Apache Hadoop distribution</span> for securing different Hive components: + </p> + + <ul class="ul"> + <li class="li"> + Secure the Hive Metastore. + </li> + + <li class="li"> + In addition, allow access to the metastore only from the HiveServer2 server, and then disable local access + to the HiveServer2 server. + </li> + </ul> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_webui.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_security_webui.html b/docs/build/html/topics/impala_security_webui.html new file mode 100644 index 0000000..6286012 --- /dev/null +++ b/docs/build/html/topics/impala_security_webui.html @@ -0,0 +1,57 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_webui"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Impala Web User Interface</title></head><body id="security_webui"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Securing the Impala Web User Interface</h1> + + + <div class="body conbody"> + + <p class="p"> + The instructions in this section presume you are familiar with the + <a class="xref" href="http://en.wikipedia.org/wiki/.htpasswd" target="_blank"> + <span class="ph filepath">.htpasswd</span> mechanism</a> commonly used to password-protect pages on web servers. + </p> + + <p class="p"> + Password-protect the Impala web UI that listens on port 25000 by default. Set up a + <span class="ph filepath">.htpasswd</span> file in the <code class="ph codeph">$IMPALA_HOME</code> directory, or start both the + <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the + <code class="ph codeph">--webserver_password_file</code> option to specify a different location (including the filename). + </p> + + <p class="p"> + This file should only be readable by the Impala process and machine administrators, because it contains + (hashed) versions of passwords. The username / password pairs are not derived from Unix usernames, Kerberos + users, or any other system. The <code class="ph codeph">domain</code> field in the password file must match the domain + supplied to Impala by the new command-line option <code class="ph codeph">--webserver_authentication_domain</code>. The + default is <code class="ph codeph">mydomain.com</code>. + + </p> + + <p class="p"> + Impala also supports using HTTPS for secure web traffic. To do so, set + <code class="ph codeph">--webserver_certificate_file</code> to refer to a valid <code class="ph codeph">.pem</code> TLS/SSL certificate file. + Impala will automatically start using HTTPS once the TLS/SSL certificate has been read and validated. A + <code class="ph codeph">.pem</code> file is basically a private key, followed by a signed TLS/SSL certificate; make sure to + concatenate both parts when constructing the <code class="ph codeph">.pem</code> file. + + </p> + + <p class="p"> + If Impala cannot find or parse the <code class="ph codeph">.pem</code> file, it prints an error message and quits. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + If the private key is encrypted using a passphrase, Impala will ask for that passphrase on startup, which + is not useful for a large cluster. In that case, remove the passphrase and make the <code class="ph codeph">.pem</code> + file readable only by Impala and administrators. + </p> + <p class="p"> + When you turn on TLS/SSL for the Impala web UI, the associated URLs change from <code class="ph codeph">http://</code> + prefixes to <code class="ph codeph">https://</code>. Adjust any bookmarks or application code that refers to those URLs. + </p> + </div> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_select.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_select.html b/docs/build/html/topics/impala_select.html new file mode 100644 index 0000000..7a12c42 --- /dev/null +++ b/docs/build/html/topics/impala_select.html @@ -0,0 +1,227 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U RI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SELECT Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + + The <code class="ph codeph">SELECT</code> statement performs queries, retrieving data from one or more tables and producing + result sets consisting of rows and columns. + </p> + + <p class="p"> + The Impala <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement also typically ends + with a <code class="ph codeph">SELECT</code> statement, to define data to copy from one table to another. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>[WITH <em class="ph i">name</em> AS (<em class="ph i">select_expression</em>) [, ...] ] +SELECT + [ALL | DISTINCT] + [STRAIGHT_JOIN] + <em class="ph i">expression</em> [, <em class="ph i">expression</em> ...] +FROM <em class="ph i">table_reference</em> [, <em class="ph i">table_reference</em> ...] +[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | [LEFT | RIGHT] ANTI | CROSS] + JOIN <em class="ph i">table_reference</em> + [ON <em class="ph i">join_equality_clauses</em> | USING (<var class="keyword varname">col1</var>[, <var class="keyword varname">col2</var> ...]] ... +WHERE <em class="ph i">conditions</em> +GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] } +HAVING <code class="ph codeph">conditions</code> +GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [, ...] } +LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>] +[UNION [ALL] <em class="ph i">select_statement</em>] ...] +</code></pre> + + <p class="p"> + Impala <code class="ph codeph">SELECT</code> queries support: + </p> + + <ul class="ul"> + <li class="li"> + SQL scalar data types: <code class="ph codeph"><a class="xref" href="impala_boolean.html#boolean">BOOLEAN</a></code>, + <code class="ph codeph"><a class="xref" href="impala_tinyint.html#tinyint">TINYINT</a></code>, + <code class="ph codeph"><a class="xref" href="impala_smallint.html#smallint">SMALLINT</a></code>, + <code class="ph codeph"><a class="xref" href="impala_int.html#int">INT</a></code>, + <code class="ph codeph"><a class="xref" href="impala_bigint.html#bigint">BIGINT</a></code>, + <code class="ph codeph"><a class="xref" href="impala_decimal.html#decimal">DECIMAL</a></code> + <code class="ph codeph"><a class="xref" href="impala_float.html#float">FLOAT</a></code>, + <code class="ph codeph"><a class="xref" href="impala_double.html#double">DOUBLE</a></code>, + <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>, + <code class="ph codeph"><a class="xref" href="impala_string.html#string">STRING</a></code>, + <code class="ph codeph"><a class="xref" href="impala_varchar.html#varchar">VARCHAR</a></code>, + <code class="ph codeph"><a class="xref" href="impala_char.html#char">CHAR</a></code>. + </li> + + + <li class="li"> + The complex data types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>, + are available in <span class="keyword">Impala 2.3</span> and higher. + Queries involving these types typically involve special qualified names + using dot notation for referring to the complex column fields, + and join clauses for bringing the complex columns into the result set. + See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details. + </li> + + <li class="li"> + An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the + <code class="ph codeph">SELECT</code> keyword, to define a subquery whose name or column names can be referenced from + later in the main query. This clause lets you abstract repeated clauses, such as aggregation functions, + that are referenced multiple times in the same query. + </li> + + <li class="li"> + By default, one <code class="ph codeph">DISTINCT</code> clause per query. See <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a> + for details. See <a class="xref" href="impala_appx_count_distinct.html#appx_count_distinct">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a> for a query option to + allow multiple <code class="ph codeph">COUNT(DISTINCT)</code> impressions in the same query. + </li> + + <li class="li"> + Subqueries in a <code class="ph codeph">FROM</code> clause. In <span class="keyword">Impala 2.0</span> and higher, + subqueries can also go in the <code class="ph codeph">WHERE</code> clause, for example with the + <code class="ph codeph">IN()</code>, <code class="ph codeph">EXISTS</code>, and <code class="ph codeph">NOT EXISTS</code> operators. + </li> + + <li class="li"> + <code class="ph codeph">WHERE</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code> clauses. + </li> + + <li class="li"> + <code class="ph codeph"><a class="xref" href="impala_order_by.html#order_by">ORDER BY</a></code>. Prior to Impala 1.4.0, Impala + required that queries using an <code class="ph codeph">ORDER BY</code> clause also include a + <code class="ph codeph"><a class="xref" href="impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and higher, this + restriction is lifted; sort operations that would exceed the Impala memory limit automatically use a + temporary disk work area to perform the sort. + </li> + + <li class="li"> + <p class="p"> + Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins + are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2 + and higher. During performance tuning, you can override the reordering of join clauses that Impala does + internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the + <code class="ph codeph">SELECT</code> keyword + </p> + <p class="p"> + See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details and examples of join queries. + </p> + </li> + + <li class="li"> + <code class="ph codeph">UNION ALL</code>. + </li> + + <li class="li"> + <code class="ph codeph">LIMIT</code>. + </li> + + <li class="li"> + External tables. + </li> + + <li class="li"> + Relational operators such as greater than, less than, or equal to. + </li> + + <li class="li"> + Arithmetic operators such as addition or subtraction. + </li> + + <li class="li"> + Logical/Boolean operators <code class="ph codeph">AND</code>, <code class="ph codeph">OR</code>, and <code class="ph codeph">NOT</code>. Impala does + not support the corresponding symbols <code class="ph codeph">&&</code>, <code class="ph codeph">||</code>, and + <code class="ph codeph">!</code>. + </li> + + <li class="li"> + Common SQL built-in functions such as <code class="ph codeph">COUNT</code>, <code class="ph codeph">SUM</code>, <code class="ph codeph">CAST</code>, + <code class="ph codeph">LIKE</code>, <code class="ph codeph">IN</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">COALESCE</code>. Impala + specifically supports built-ins described in <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>. + </li> + </ul> + + <p class="p"> + Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any + files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the + Impala table. The suffix matching is case-insensitive, so for example Impala ignores both + <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes. + </p> + + <p class="p"> + <strong class="ph b">Security considerations:</strong> + </p> + <p class="p"> + If these statements in your environment contain sensitive literal values such as credit card numbers or tax + identifiers, Impala can redact this sensitive information when displaying the statements in log files and + other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details. + </p> + + <p class="p"> + <strong class="ph b">Amazon S3 considerations:</strong> + </p> + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3. + For Impala tables that use the file formats Parquet, RCFile, SequenceFile, + Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code> + in the <span class="ph filepath">core-site.xml</span> configuration file determines + how Impala divides the I/O work of reading the data files. This configuration + setting is specified in bytes. By default, this + value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files + as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access + Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> + to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve + Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> + to 268435456 (256 MB) to match the row group size produced by Impala. + </p> + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the + <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the + <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of + in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI + (port 25000). + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> + </p> + <p class="p"> + The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, + typically the <code class="ph codeph">impala</code> user, must have read + permissions for the files in all applicable directories in all source tables, + and read and execute permissions for the relevant data directories. + (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories + if the source table is partitioned.) + If a query attempts to read a data file and is unable to because of an HDFS permission error, + the query halts and does not return any further results. + </p> + + <p class="p toc"></p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">SELECT</code> syntax is so extensive that it forms its own category of statements: queries. The + other major classifications of SQL statements are data definition language (see + <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and data manipulation language (see <a class="xref" href="impala_dml.html#dml">DML Statements</a>). + </p> + + <p class="p"> + Because the focus of Impala is on fast queries with interactive response times over huge data sets, query + performance and scalability are important considerations. See + <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> and <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for + details. + </p> + </div> + + +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong>< br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Query Hints in Impala SELECT Statements</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_seqfile.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_seqfile.html b/docs/build/html/topics/impala_seqfile.html new file mode 100644 index 0000000..53a0eaf --- /dev/null +++ b/docs/build/html/topics/impala_seqfile.html @@ -0,0 +1,240 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="seqfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the SequenceFile File Format with Impala Tables</title></head><body id="seqfile"><main role="main"><article role="article" aria-labelledby="seqfile__sequencefile"> + + <h1 class="title topictitle1" id="seqfile__sequencefile">Using the SequenceFile File Format with Impala Tables</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Impala supports using SequenceFile data files. + </p> + + <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">SequenceFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead"> + <tr class="row"> + <th class="entry nocellnorowborder" id="seqfile__entry__1"> + File Type + </th> + <th class="entry nocellnorowborder" id="seqfile__entry__2"> + Format + </th> + <th class="entry nocellnorowborder" id="seqfile__entry__3"> + Compression Codecs + </th> + <th class="entry nocellnorowborder" id="seqfile__entry__4"> + Impala Can CREATE? + </th> + <th class="entry nocellnorowborder" id="seqfile__entry__5"> + Impala Can INSERT? + </th> + </tr> + </thead><tbody class="tbody"> + <tr class="row"> + <td class="entry nocellnorowborder" headers="seqfile__entry__1 "> + <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a> + </td> + <td class="entry nocellnorowborder" headers="seqfile__entry__2 "> + Structured + </td> + <td class="entry nocellnorowborder" headers="seqfile__entry__3 "> + Snappy, gzip, deflate, bzip2 + </td> + <td class="entry nocellnorowborder" headers="seqfile__entry__4 ">Yes.</td> + <td class="entry nocellnorowborder" headers="seqfile__entry__5 "> + No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use + <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala. + </td> + + </tr> + </tbody></table> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="seqfile__seqfile_create"> + + <h2 class="title topictitle2" id="ariaid-title2">Creating SequenceFile Tables and Loading Data</h2> + + + <div class="body conbody"> + + <p class="p"> + If you do not have an existing data file to use, begin by creating one in the appropriate format. + </p> + + <p class="p"> + <strong class="ph b">To create a SequenceFile table:</strong> + </p> + + <p class="p"> + In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to: + </p> + +<pre class="pre codeblock"><code>create table sequencefile_table (<var class="keyword varname">column_specs</var>) stored as sequencefile;</code></pre> + + <p class="p"> + Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of + certain file formats, you might use the Hive shell to load the data. See + <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through + Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> + statement the next time you connect to the Impala node, before querying the table, to make Impala recognize + the new data. + </p> + + <p class="p"> + For example, here is how you might create some SequenceFile tables in Impala (by specifying the columns + explicitly, or cloning the structure of another table), load data through Hive, and query them through + Impala: + </p> + +<pre class="pre codeblock"><code>$ impala-shell -i localhost +[localhost:21000] > create table seqfile_table (x int) stored as sequencefile; +[localhost:21000] > create table seqfile_clone like some_other_table stored as sequencefile; +[localhost:21000] > quit; + +$ hive +hive> insert into table seqfile_table select x from some_other_table; +3 Rows loaded to seqfile_table +Time taken: 19.047 seconds +hive> quit; + +$ impala-shell -i localhost +[localhost:21000] > select * from seqfile_table; +Returned 0 row(s) in 0.23s +[localhost:21000] > -- Make Impala recognize the data loaded through Hive; +[localhost:21000] > refresh seqfile_table; +[localhost:21000] > select * from seqfile_table; ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | ++---+ +Returned 3 row(s) in 0.23s</code></pre> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + Although you can create tables in this file format using + the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, + and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher, + currently, Impala can query these types only in Parquet tables. + <span class="ph"> + The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types. + Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher. + </span> + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="seqfile__seqfile_compression"> + + <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for SequenceFile Tables</h2> + + + <div class="body conbody"> + + <p class="p"> + + You may want to enable compression on existing tables. Enabling compression provides performance gains in + most cases and is supported for SequenceFile tables. For example, to enable Snappy compression, you would + specify the following additional settings when loading data through the Hive shell: + </p> + +<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> insert overwrite table <var class="keyword varname">new_table</var> select * from <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + If you are converting partitioned tables, you must complete additional steps. In such a case, specify + additional settings similar to the following: + </p> + +<pre class="pre codeblock"><code>hive> create table <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) partitioned by (<var class="keyword varname">partition_cols</var>) stored as <var class="keyword varname">new_format</var>; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> insert overwrite table <var class="keyword varname">new_table</var> partition(<var class="keyword varname">comma_separated_partition_cols</var>) select * from <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + Remember that Hive does not require that you specify a source format for it. Consider the case of + converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a + Snappy compressed SequenceFile. Combining the components outlined previously to complete this table + conversion, you would specify settings similar to the following: + </p> + +<pre class="pre codeblock"><code>hive> create table TBL_SEQ (int_col int, string_col string) STORED AS SEQUENCEFILE; +hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_seq SELECT * FROM tbl;</code></pre> + + <p class="p"> + To complete a similar process for a table that includes partitions, you would specify settings similar to + the following: + </p> + +<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_seq (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS SEQUENCEFILE; +hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_seq PARTITION(year) SELECT * FROM tbl;</code></pre> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The compression type is specified in the following command: + </p> +<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre> + <p class="p"> + You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here. + </p> + </div> + </div> + </article> + + + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="seqfile__seqfile_performance"> + + <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala SequenceFile Tables</h2> + + <div class="body conbody"> + + <p class="p"> + In general, expect query performance with SequenceFile tables to be + faster than with tables using text data, but slower than with + Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> + for information about using the Parquet file format for + high-performance analytic queries. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3. + For Impala tables that use the file formats Parquet, RCFile, SequenceFile, + Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code> + in the <span class="ph filepath">core-site.xml</span> configuration file determines + how Impala divides the I/O work of reading the data files. This configuration + setting is specified in bytes. By default, this + value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files + as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access + Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> + to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve + Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> + to 268435456 (256 MB) to match the row group size produced by Impala. + </p> + + </div> + </article> + +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_set.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_set.html b/docs/build/html/topics/impala_set.html new file mode 100644 index 0000000..b16ff7b --- /dev/null +++ b/docs/build/html/topics/impala_set.html @@ -0,0 +1,200 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="set"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SET Statement</title></head><body id="set"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">SET Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Specifies values for query options that control the runtime behavior of other statements within the same + session. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">SET</code> also defines user-specified substitution variables for + the <span class="keyword cmdname">impala-shell</span> interpreter. This feature uses the <code class="ph codeph">SET</code> command + built into <span class="keyword cmdname">impala-shell</span> instead of the SQL <code class="ph codeph">SET</code> statement. + Therefore the substitution mechanism only works with queries processed by <span class="keyword cmdname">impala-shell</span>, + not with queries submitted through JDBC or ODBC. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>SET [<var class="keyword varname">query_option</var>=<var class="keyword varname">option_value</var>] +</code></pre> + + <p class="p"> + <code class="ph codeph">SET</code> with no arguments returns a result set consisting of all available query options and + their current values. + </p> + + <p class="p"> + The query option name and any string argument values are case-insensitive. + </p> + + <p class="p"> + Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and + disabled by assigning values of either <code class="ph codeph">true</code> and <code class="ph codeph">false</code>, or + <code class="ph codeph">1</code> and <code class="ph codeph">0</code>. Some numeric options accept a final character signifying the unit, + such as <code class="ph codeph">2g</code> for 2 gigabytes or <code class="ph codeph">100m</code> for 100 megabytes. See + <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the details of each query option. + </p> + + <p class="p"> + <strong class="ph b">User-specified substitution variables:</strong> + </p> + + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, you can specify your own names and string substitution values + within the <span class="keyword cmdname">impala-shell</span> interpreter. Once a substitution variable is set up, + its value is inserted into any SQL statement in that same <span class="keyword cmdname">impala-shell</span> session + that contains the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>. + Using <code class="ph codeph">SET</code> in an interactive <span class="keyword cmdname">impala-shell</span> session overrides + any value for that same variable passed in through the <code class="ph codeph">--var=<var class="keyword varname">varname</var>=<var class="keyword varname">value</var></code> + command-line option. + </p> + + <p class="p"> + For example, to set up some default parameters for report queries, but then override those default + within an <span class="keyword cmdname">impala-shell</span> session, you might issue commands and statements such as + the following: + </p> + +<pre class="pre codeblock"><code> +-- Initial setup for this example. +create table staging_table (s string); +insert into staging_table values ('foo'), ('bar'), ('bletch'); + +create table production_table (s string); +insert into production_table values ('North America'), ('EMEA'), ('Asia'); +quit; + +-- Start impala-shell with user-specified substitution variables, +-- run a query, then override the variables with SET and run the query again. +$ impala-shell --var=table_name=staging_table --var=cutoff=2 +... <var class="keyword varname">banner message</var> ... +[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; +Query: select s from staging_table order by s limit 2 ++--------+ +| s | ++--------+ +| bar | +| bletch | ++--------+ +Fetched 2 row(s) in 1.06s + +[localhost:21000] > set var:table_name=production_table; +Variable TABLE_NAME set to production_table +[localhost:21000] > set var:cutoff=3; +Variable CUTOFF set to 3 + +[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; +Query: select s from production_table order by s limit 3 ++---------------+ +| s | ++---------------+ +| Asia | +| EMEA | +| North America | ++---------------+ +</code></pre> + + <p class="p"> + The following example shows how <code class="ph codeph">SET</code> with no parameters displays + all user-specified substitution variables, and how <code class="ph codeph">UNSET</code> removes + the substitution variable entirely: + </p> + +<pre class="pre codeblock"><code> +[localhost:21000] > set; +Query options (defaults shown in []): + ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0] + ... + V_CPU_CORES: [0] + +Shell Options + LIVE_PROGRESS: False + LIVE_SUMMARY: False + +Variables: + CUTOFF: 3 + TABLE_NAME: staging_table + +[localhost:21000] > unset var:cutoff; +Unsetting variable CUTOFF +[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; +Error: Unknown variable CUTOFF +</code></pre> + + <p class="p"> + See <a class="xref" href="impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a> for more examples of using the + <code class="ph codeph">--var</code>, <code class="ph codeph">SET</code>, and <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code> + substitution technique in <span class="keyword cmdname">impala-shell</span>. + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + <code class="ph codeph">MEM_LIMIT</code> is probably the most commonly used query option. You can specify a high value to + allow a resource-intensive query to complete. For testing how queries would work on memory-constrained + systems, you might specify an artificially low value. + </p> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example sets some numeric and some Boolean query options to control usage of memory, disk + space, and timeout periods, then runs a query whose success could depend on the options in effect: + </p> + +<pre class="pre codeblock"><code>set mem_limit=64g; +set DISABLE_UNSAFE_SPILLS=true; +set parquet_file_size=400m; +set RESERVATION_REQUEST_TIMEOUT=900000; +insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3; +</code></pre> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span> + </p> + + <p class="p"> + <code class="ph codeph">SET</code> has always been available as an <span class="keyword cmdname">impala-shell</span> command. Promoting it to + a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs. + </p> + + + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Cannot be cancelled. + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories, + therefore no HDFS permissions are required. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the query options you can adjust using this + statement. + </p> + </div> +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_query_options.html">Query Options for the SET Statement</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_shell_commands.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_shell_commands.html b/docs/build/html/topics/impala_shell_commands.html new file mode 100644 index 0000000..d2bee6c --- /dev/null +++ b/docs/build/html/topics/impala_shell_commands.html @@ -0,0 +1,392 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Command Reference</title></head><body id="shell_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">impala-shell Command Reference</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Use the following commands within <code class="ph codeph">impala-shell</code> to pass requests to the + <code class="ph codeph">impalad</code> daemon that the shell is connected to. You can enter a command interactively at the + prompt, or pass it as the argument to the <code class="ph codeph">-q</code> option of <code class="ph codeph">impala-shell</code>. Most + of these commands are passed to the Impala daemon as SQL statements; refer to the corresponding + <a class="xref" href="impala_langref_sql.html#langref_sql">SQL language reference sections</a> for full syntax + details. + </p> + + <table class="table"><caption></caption><colgroup><col style="width:20%"><col style="width:80%"></colgroup><thead class="thead"> + <tr class="row"> + <th class="entry nocellnorowborder" id="shell_commands__entry__1"> + Command + </th> + <th class="entry nocellnorowborder" id="shell_commands__entry__2"> + Explanation + </th> + </tr> + </thead><tbody class="tbody"> + <tr class="row" id="shell_commands__alter_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">alter</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Changes the underlying structure or settings of an Impala table, or a table shared between Impala + and Hive. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> and + <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__compute_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">compute stats</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Gathers important performance-related information for a table, used by Impala to optimize queries. + See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__connect_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">connect</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Connects to the specified instance of <code class="ph codeph">impalad</code>. The default port of 21000 is + assumed unless you provide another value. You can connect to any host in your cluster that is + running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that + was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, you must + provide that alternate port. See <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a> for examples. + </p> + + <p class="p"> + The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is + connected to an Impala server. Once you are connected, any query options you set remain in effect as you + issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__describe_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">describe</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Shows the columns, column data types, and any column comments for a specified table. + <code class="ph codeph">DESCRIBE FORMATTED</code> shows additional information such as the HDFS data directory, + partitions, and internal properties for the table. See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> + for details about the basic <code class="ph codeph">DESCRIBE</code> output and the <code class="ph codeph">DESCRIBE + FORMATTED</code> variant. You can use <code class="ph codeph">DESC</code> as shorthand for the + <code class="ph codeph">DESCRIBE</code> command. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__drop_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">drop</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Removes a schema object, and in some cases its associated data files. See + <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>, + <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, and + <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__explain_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">explain</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Provides the execution plan for a query. <code class="ph codeph">EXPLAIN</code> represents a query as a series of + steps. For example, these steps might be map/reduce stages, metastore operations, or file system + operations such as move or rename. See <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and + <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__help_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">help</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Help provides a list of all available commands and options. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__history_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">history</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Maintains an enumerated cross-session command history. This history is stored in the + <span class="ph filepath">~/.impalahistory</span> file. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__insert_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">insert</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Writes the results of a query to a specified table. This either overwrites table data or appends + data to the existing table content. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__invalidate_metadata_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">invalidate metadata</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Updates <span class="keyword cmdname">impalad</span> metadata for table existence and structure. Use this command + after creating, dropping, or altering databases, tables, or partitions in Hive. See + <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__profile_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">profile</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Displays low-level information about the most recent query. Used for performance diagnosis and + tuning. <span class="ph"> The report starts with the same information as produced by the + <code class="ph codeph">EXPLAIN</code> statement and the <code class="ph codeph">SUMMARY</code> command.</span> See + <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__quit_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">quit</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Exits the shell. Remember to include the final semicolon so that the shell recognizes the end of + the command. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__refresh_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">refresh</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Refreshes <span class="keyword cmdname">impalad</span> metadata for the locations of HDFS blocks corresponding to + Impala data files. Use this command after loading new data files into an Impala table through Hive + or through HDFS commands. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__select_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">select</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Specifies the data set on which to complete some action. All information returned from + <code class="ph codeph">select</code> can be sent to some output such as the console or a file or can be used to + complete some other element of query. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__set_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">set</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Manages query options for an <span class="keyword cmdname">impala-shell</span> session. The available options are the + ones listed in <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a>. These options are used for + query tuning and troubleshooting. Issue <code class="ph codeph">SET</code> with no arguments to see the current + query options, either based on the <span class="keyword cmdname">impalad</span> defaults, as specified by you at + <span class="keyword cmdname">impalad</span> startup, or based on earlier <code class="ph codeph">SET</code> statements in the same + session. To modify option values, issue commands with the syntax <code class="ph codeph">set + <var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>. To restore an option to its default, + use the <code class="ph codeph">unset</code> command. Some options take Boolean values of <code class="ph codeph">true</code> + and <code class="ph codeph">false</code>. Others take numeric arguments, or quoted string values. + </p> + + <p class="p"> + The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is + connected to an Impala server. Once you are connected, any query options you set remain in effect as you + issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host. + </p> + + <p class="p"> + In Impala 2.0 and later, <code class="ph codeph">SET</code> is available as a SQL statement for any kind of + application, not only through <span class="keyword cmdname">impala-shell</span>. See + <a class="xref" href="impala_set.html#set">SET Statement</a> for details. + </p> + + <p class="p"> + In Impala 2.5 and later, you can use <code class="ph codeph">SET</code> to define your own substitution variables + within an <span class="keyword cmdname">impala-shell</span> session. + Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__shell_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">shell</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Executes the specified command in the operating system shell without exiting + <code class="ph codeph">impala-shell</code>. You can use the <code class="ph codeph">!</code> character as shorthand for the + <code class="ph codeph">shell</code> command. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Quote any instances of the <code class="ph codeph">--</code> or <code class="ph codeph">/*</code> tokens to avoid them being + interpreted as the start of a comment. To embed comments within <code class="ph codeph">source</code> or + <code class="ph codeph">!</code> commands, use the shell comment character <code class="ph codeph">#</code> before the comment + portion of the line. + </div> + </td> + </tr> + <tr class="row" id="shell_commands__show_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">show</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Displays metastore data for schema objects created and accessed through Impala, Hive, or both. + <code class="ph codeph">show</code> can be used to gather information about objects such as databases, tables, and functions. + See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__source_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">source</code> or <code class="ph codeph">src</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Executes one or more statements residing in a specified file from the local filesystem. + Allows you to perform the same kinds of batch operations as with the <code class="ph codeph">-f</code> option, + but interactively within the interpreter. The file can contain SQL statements and other + <span class="keyword cmdname">impala-shell</span> commands, including additional <code class="ph codeph">SOURCE</code> commands + to perform a flexible sequence of actions. Each command or statement, except the last one in the file, + must end with a semicolon. + See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for examples. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__summary_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">summary</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Summarizes the work performed in various stages of a query. It provides a higher-level view of the + information displayed by the <code class="ph codeph">EXPLAIN</code> command. Added in Impala 1.4.0. See + <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for details about the report format + and how to interpret it. + </p> + <p class="p"> + In <span class="keyword">Impala 2.3</span> and higher, you can see a continuously updated report of + the summary information while a query is in progress. + See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a> for details. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__unset_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">unset</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Removes any user-specified value for a query option and returns the option to its default value. + See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the available query options. + </p> + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, it can also remove user-specified substitution variables + using the notation <code class="ph codeph">UNSET VAR:<var class="keyword varname">variable_name</var></code>. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__use_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">use</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Indicates the database against which to execute subsequent commands. Lets you avoid using fully + qualified names when referring to tables in databases other than <code class="ph codeph">default</code>. See + <a class="xref" href="impala_use.html#use">USE Statement</a> for details. Not effective with the <code class="ph codeph">-q</code> option, + because that option only allows a single statement in the argument. + </p> + </td> + </tr> + <tr class="row" id="shell_commands__version_cmd"> + <td class="entry nocellnorowborder" headers="shell_commands__entry__1 "> + <p class="p"> + <code class="ph codeph">version</code> + </p> + </td> + <td class="entry nocellnorowborder" headers="shell_commands__entry__2 "> + <p class="p"> + Returns Impala version information. + </p> + </td> + </tr> + </tbody></table> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html> \ No newline at end of file
