http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_config_options.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_config_options.html b/docs/build/html/topics/impala_config_options.html new file mode 100644 index 0000000..5bf3ff2 --- /dev/null +++ b/docs/build/html/topics/impala_config_options.html @@ -0,0 +1,361 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_processes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Modifying Impala Startup Options</title></head><body id="config_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Modifying Impala Startup Options</h1> + + + + <div class="body conbody"> + + <p class="p"> + + + + + + + + + + + + + + + + + + + + + + + + The configuration options for the Impala-related daemons let you choose which hosts and + ports to use for the services that run on a single host, specify directories for logging, + control resource usage and security, and specify other aspects of the Impala software. + </p> + + <p class="p toc inpage"></p> + + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_processes.html">Starting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="config_options__config_options_noncm"> + + <h2 class="title topictitle2" id="ariaid-title2">Configuring Impala Startup Options through the Command Line</h2> + + <div class="body conbody"> + + <p class="p"> The Impala server, statestore, and catalog services start up using values provided in a + defaults file, <span class="ph filepath">/etc/default/impala</span>. </p> + + <p class="p"> + This file includes information about many resources used by Impala. Most of the defaults + included in this file should be effective in most cases. For example, typically you + would not change the definition of the <code class="ph codeph">CLASSPATH</code> variable, but you + would always set the address used by the statestore server. Some of the content you + might modify includes: + </p> + + + +<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1 +IMPALA_STATE_STORE_PORT=24000 +IMPALA_BACKEND_PORT=22000 +IMPALA_LOG_DIR=/var/log/impala +IMPALA_CATALOG_SERVICE_HOST=... +IMPALA_STATE_STORE_HOST=... + +export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \ + -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}} +IMPALA_SERVER_ARGS=" \ +-log_dir=${IMPALA_LOG_DIR} \ +-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \ +-state_store_port=${IMPALA_STATE_STORE_PORT} \ +-use_statestore \ +-state_store_host=${IMPALA_STATE_STORE_HOST} \ +-be_port=${IMPALA_BACKEND_PORT}" +export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre> + + <p class="p"> + To use alternate values, edit the defaults file, then restart all the Impala-related + services so that the changes take effect. Restart the Impala server using the following + commands: + </p> + +<pre class="pre codeblock"><code>$ sudo service impala-server restart +Stopping Impala Server: [ OK ] +Starting Impala Server: [ OK ]</code></pre> + + <p class="p"> + Restart the Impala statestore using the following commands: + </p> + +<pre class="pre codeblock"><code>$ sudo service impala-state-store restart +Stopping Impala State Store Server: [ OK ] +Starting Impala State Store Server: [ OK ]</code></pre> + + <p class="p"> + Restart the Impala catalog service using the following commands: + </p> + +<pre class="pre codeblock"><code>$ sudo service impala-catalog restart +Stopping Impala Catalog Server: [ OK ] +Starting Impala Catalog Server: [ OK ]</code></pre> + + <p class="p"> + Some common settings to change include: + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Statestore address. Where practical, put the statestore on a separate host not + running the <span class="keyword cmdname">impalad</span> daemon. In that recommended configuration, + the <span class="keyword cmdname">impalad</span> daemon cannot refer to the statestore server using + the loopback address. If the statestore is hosted on a machine with an IP address of + 192.168.0.27, change: + </p> +<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1</code></pre> + <p class="p"> + to: + </p> +<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=192.168.0.27</code></pre> + </li> + + <li class="li"> + <p class="p"> + Catalog server address (including both the hostname and the port number). Update the + value of the <code class="ph codeph">IMPALA_CATALOG_SERVICE_HOST</code> variable. Where + practical, run the catalog server on the same host as the statestore. In that + recommended configuration, the <span class="keyword cmdname">impalad</span> daemon cannot refer to the + catalog server using the loopback address. If the catalog service is hosted on a + machine with an IP address of 192.168.0.27, add the following line: + </p> +<pre class="pre codeblock"><code>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</code></pre> + <p class="p"> + The <span class="ph filepath">/etc/default/impala</span> defaults file currently does not define + an <code class="ph codeph">IMPALA_CATALOG_ARGS</code> environment variable, but if you add one it + will be recognized by the service startup/shutdown script. Add a definition for this + variable to <span class="ph filepath">/etc/default/impala</span> and add the option + <code class="ph codeph">-catalog_service_host=<var class="keyword varname">hostname</var></code>. If the port is + different than the default 26000, also add the option + <code class="ph codeph">-catalog_service_port=<var class="keyword varname">port</var></code>. + </p> + </li> + + <li class="li" id="config_options_noncm__mem_limit"> + <p class="p"> + Memory limits. You can limit the amount of memory available to Impala. For example, + to allow Impala to use no more than 70% of system memory, change: + </p> + +<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ + -log_dir=${IMPALA_LOG_DIR} \ + -state_store_port=${IMPALA_STATE_STORE_PORT} \ + -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \ + -be_port=${IMPALA_BACKEND_PORT}}</code></pre> + <p class="p"> + to: + </p> +<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ + -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \ + -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \ + -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</code></pre> + <p class="p"> + You can specify the memory limit using absolute notation such as + <code class="ph codeph">500m</code> or <code class="ph codeph">2G</code>, or as a percentage of physical memory + such as <code class="ph codeph">60%</code>. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Queries that exceed the specified memory limit are aborted. Percentage limits are + based on the physical memory of the machine and do not consider cgroups. + </div> + </li> + + <li class="li"> + <p class="p"> Core dump enablement. To enable core dumps, change: </p> +<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre> + <p class="p"> + to: + </p> +<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</code></pre> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <ul class="ul"> + <li class="li"> + <p class="p"> + The location of core dump files may vary according to your operating system configuration. + </p> + </li> + <li class="li"> + <p class="p"> + Other security settings may prevent Impala from writing core dumps even when this option is enabled. + </p> + </li> + </ul> + </div> + </li> + + <li class="li"> + <p class="p"> + Authorization using the open source Sentry plugin. Specify the + <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code> + options as part of the <code class="ph codeph">IMPALA_SERVER_ARGS</code> and + <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> settings to enable the core Impala support + for authentication. See <a class="xref" href="impala_authorization.html#secure_startup">Starting the impalad Daemon with Sentry Authorization Enabled</a> for + details. + </p> + </li> + + <li class="li"> + <p class="p"> + Auditing for successful or blocked Impala queries, another aspect of security. + Specify the <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code> + option and optionally the + <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code> + and <code class="ph codeph">-abort_on_failed_audit_event</code> options as part of the + <code class="ph codeph">IMPALA_SERVER_ARGS</code> settings, for each Impala node, to enable and + customize auditing. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details. + </p> + </li> + + <li class="li"> + <p class="p"> + Password protection for the Impala web UI, which listens on port 25000 by default. + This feature involves adding some or all of the + <code class="ph codeph">--webserver_password_file</code>, + <code class="ph codeph">--webserver_authentication_domain</code>, and + <code class="ph codeph">--webserver_certificate_file</code> options to the + <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> + settings. See <a class="xref" href="impala_security_guidelines.html#security_guidelines">Security Guidelines for Impala</a> for + details. + </p> + </li> + + <li class="li" id="config_options_noncm__default_query_options"> + <div class="p"> + Another setting you might add to <code class="ph codeph">IMPALA_SERVER_ARGS</code> is a + comma-separated list of query options and values: +<pre class="pre codeblock"><code>-default_query_options='<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,...' +</code></pre> + These options control the behavior of queries performed by this + <span class="keyword cmdname">impalad</span> instance. The option values you specify here override the + default values for <a class="xref" href="impala_query_options.html#query_options">Impala query + options</a>, as shown by the <code class="ph codeph">SET</code> statement in + <span class="keyword cmdname">impala-shell</span>. + </div> + </li> + + + + <li class="li"> + <p class="p"> + During troubleshooting, <span class="keyword">the appropriate support channel</span> might direct you to change other values, + particularly for <code class="ph codeph">IMPALA_SERVER_ARGS</code>, to work around issues or + gather debugging information. + </p> + </li> + </ul> + + + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + These startup options for the <span class="keyword cmdname">impalad</span> daemon are different from the + command-line options for the <span class="keyword cmdname">impala-shell</span> command. For the + <span class="keyword cmdname">impala-shell</span> options, see + <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>. + </p> + </div> + + + + </div> + + + + + + + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="config_options__config_options_checking"> + + <h2 class="title topictitle2" id="ariaid-title3">Checking the Values of Impala Configuration Options</h2> + + <div class="body conbody"> + + <p class="p"> + You can check the current runtime value of all these settings through the Impala web + interface, available by default at + <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25000/varz</code> for the + <span class="keyword cmdname">impalad</span> daemon, + <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25010/varz</code> for the + <span class="keyword cmdname">statestored</span> daemon, or + <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25020/varz</code> for the + <span class="keyword cmdname">catalogd</span> daemon. + </p> + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="config_options__config_options_impalad"> + + <h2 class="title topictitle2" id="ariaid-title4">Startup Options for impalad Daemon</h2> + + <div class="body conbody"> + + <p class="p"> + The <code class="ph codeph">impalad</code> daemon implements the main Impala service, which performs + query processing and reads and writes the data files. + </p> + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="config_options__config_options_statestored"> + + <h2 class="title topictitle2" id="ariaid-title5">Startup Options for statestored Daemon</h2> + + <div class="body conbody"> + + <p class="p"> + The <span class="keyword cmdname">statestored</span> daemon implements the Impala statestore service, + which monitors the availability of Impala services across the cluster, and handles + situations such as nodes becoming unavailable or becoming available again. + </p> + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="config_options__config_options_catalogd"> + + <h2 class="title topictitle2" id="ariaid-title6">Startup Options for catalogd Daemon</h2> + + <div class="body conbody"> + + <p class="p"> + The <span class="keyword cmdname">catalogd</span> daemon implements the Impala catalog service, which + broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts + data, or performs other kinds of DDL and DML operations. + </p> + + <p class="p"> + By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin + accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was + loaded before accepting any requests, set the <span class="keyword cmdname">catalogd</span> configuration option + <code class="ph codeph">--load_catalog_in_background=false</code>. + </p> + + </div> + + </article> + +</article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_config_performance.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_config_performance.html b/docs/build/html/topics/impala_config_performance.html new file mode 100644 index 0000000..61de174 --- /dev/null +++ b/docs/build/html/topics/impala_config_performance.html @@ -0,0 +1,149 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Post-Installation Configuration for Impala</title></head><body id="config_performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Post-Installation Configuration for Impala</h1> + + + <div class="body conbody"> + + <p class="p" id="config_performance__p_24"> + This section describes the mandatory and recommended configuration settings for Impala. If Impala is + installed using cluster management software, some of these configurations might be completed automatically; you must still + configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic. + </p> + + <ul class="ul"> + <li class="li"> + You must enable short-circuit reads, whether or not Impala was installed with cluster + management software. This setting goes in the Impala configuration settings, not the Hadoop-wide settings. + </li> + + <li class="li"> + You must enable block location tracking, and you can optionally enable native checksumming for optimal performance. + </li> + </ul> + + <section class="section" id="config_performance__section_fhq_wyv_ls"><h2 class="title sectiontitle">Mandatory: Short-Circuit Reads</h2> + + <p class="p"> Enabling short-circuit reads allows Impala to read local data directly + from the file system. This removes the need to communicate through the + DataNodes, improving performance. This setting also minimizes the number + of additional copies of data. Short-circuit reads requires + <code class="ph codeph">libhadoop.so</code> + (the Hadoop Native Library) to be accessible to both the server and the + client. <code class="ph codeph">libhadoop.so</code> is not available if you have + installed from a tarball. You must install from an + <code class="ph codeph">.rpm</code>, <code class="ph codeph">.deb</code>, or parcel to use + short-circuit local reads. + </p> + <p class="p"> + <strong class="ph b">To configure DataNodes for short-circuit reads:</strong> + </p> + <ol class="ol" id="config_performance__ol_qlq_wyv_ls"> + <li class="li" id="config_performance__copy_config_files"> Copy the client + <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code> + configuration files from the Hadoop configuration directory to the + Impala configuration directory. The default Impala configuration + location is <code class="ph codeph">/etc/impala/conf</code>. </li> + <li class="li"> + + + + On all Impala nodes, configure the following properties in + + Impala's copy of <code class="ph codeph">hdfs-site.xml</code> as shown: <pre class="pre codeblock"><code><property> + <name>dfs.client.read.shortcircuit</name> + <value>true</value> +</property> + +<property> + <name>dfs.domain.socket.path</name> + <value>/var/run/hdfs-sockets/dn</value> +</property> + +<property> + <name>dfs.client.file-block-storage-locations.timeout.millis</name> + <value>10000</value> +</property></code></pre> + + + </li> + <li class="li"> + <p class="p"> If <code class="ph codeph">/var/run/hadoop-hdfs/</code> is group-writable, make + sure its group is <code class="ph codeph">root</code>. </p> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> If you are also going to enable block location tracking, you + can skip copying configuration files and restarting DataNodes and go + straight to <a class="xref" href="#config_performance__block_location_tracking">Optional: Block Location Tracking</a>. + Configuring short-circuit reads and block location tracking require + the same process of copying files and restarting services, so you + can complete that process once when you have completed all + configuration changes. Whether you copy files and restart services + now or during configuring block location tracking, short-circuit + reads are not enabled until you complete those final steps. </div> + </li> + <li class="li" id="config_performance__restart_all_datanodes"> After applying these changes, restart + all DataNodes. </li> + </ol> + </section> + + <section class="section" id="config_performance__block_location_tracking"><h2 class="title sectiontitle">Mandatory: Block Location Tracking</h2> + + + + <p class="p"> + Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing + better utilization of the underlying disks. Impala will not start unless this setting is enabled. + </p> + + <p class="p"> + <strong class="ph b">To enable block location tracking:</strong> + </p> + + <ol class="ol"> + <li class="li"> + For each DataNode, adding the following to the <code class="ph codeph">hdfs-site.xml</code> file: +<pre class="pre codeblock"><code><property> + <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> + <value>true</value> +</property> </code></pre> + </li> + + <li class="li"> Copy the client + <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code> + configuration files from the Hadoop configuration directory to the + Impala configuration directory. The default Impala configuration + location is <code class="ph codeph">/etc/impala/conf</code>. </li> + + <li class="li"> After applying these changes, restart + all DataNodes. </li> + </ol> + </section> + + <section class="section" id="config_performance__native_checksumming"><h2 class="title sectiontitle">Optional: Native Checksumming</h2> + + + + <p class="p"> + Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if + that library is available. + </p> + + <p class="p" id="config_performance__p_29"> + <strong class="ph b">To enable native checksumming:</strong> + </p> + + <p class="p"> + If you installed <span class="keyword"></span> from packages, the native checksumming library is installed and setup correctly. In + such a case, no additional steps are required. Conversely, if you installed by other means, such as with + tarballs, native checksumming may not be available due to missing shared objects. Finding the message + "<code class="ph codeph">Unable to load native-hadoop library for your platform... using builtin-java classes where + applicable</code>" in the Impala logs indicates native checksumming may be unavailable. To enable native + checksumming, you must build and install <code class="ph codeph">libhadoop.so</code> (the + + + Hadoop Native Library). + </p> + </section> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_connecting.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_connecting.html b/docs/build/html/topics/impala_connecting.html new file mode 100644 index 0000000..e48d850 --- /dev/null +++ b/docs/build/html/topics/impala_connecting.html @@ -0,0 +1,190 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="connecting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Connecting to impalad through impala-shell</title></head><body id="connecting"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Connecting to impalad through impala-shell</h1> + + + + <div class="body conbody"> + + + + <div class="p"> + Within an <span class="keyword cmdname">impala-shell</span> session, you can only issue queries while connected to an instance + of the <span class="keyword cmdname">impalad</span> daemon. You can specify the connection information: + <ul class="ul"> + <li class="li"> + Through command-line options when you run the <span class="keyword cmdname">impala-shell</span> command. + </li> + <li class="li"> + Through a configuration file that is read when you run the <span class="keyword cmdname">impala-shell</span> command. + </li> + <li class="li"> + During an <span class="keyword cmdname">impala-shell</span> session, by issuing a <code class="ph codeph">CONNECT</code> command. + </li> + </ul> + See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for the command-line and configuration file options you can use. + </div> + + <p class="p"> + You can connect to any DataNode where an instance of <span class="keyword cmdname">impalad</span> is running, + and that host coordinates the execution of all queries sent to it. + </p> + + <p class="p"> + For simplicity during development, you might always connect to the same host, perhaps running <span class="keyword cmdname">impala-shell</span> on + the same host as <span class="keyword cmdname">impalad</span> and specifying the hostname as <code class="ph codeph">localhost</code>. + </p> + + <p class="p"> + In a production environment, you might enable load balancing, in which you connect to specific host/port combination + but queries are forwarded to arbitrary hosts. This technique spreads the overhead of acting as the coordinator + node among all the DataNodes in the cluster. See <a class="xref" href="impala_proxy.html">Using Impala through a Proxy for High Availability</a> for details. + </p> + + <p class="p"> + <strong class="ph b">To connect the Impala shell during shell startup:</strong> + </p> + + <ol class="ol"> + <li class="li"> + Locate the hostname of a DataNode within the cluster that is running an instance of the + <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something + other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the + port number also. + </li> + + <li class="li"> + Use the <code class="ph codeph">-i</code> option to the + <span class="keyword cmdname">impala-shell</span> interpreter to specify the connection information for + that instance of <span class="keyword cmdname">impalad</span>: +<pre class="pre codeblock"><code> +# When you are logged into the same machine running impalad. +# The prompt will reflect the current hostname. +$ impala-shell + +# When you are logged into the same machine running impalad. +# The host will reflect the hostname 'localhost'. +$ impala-shell -i localhost + +# When you are logged onto a different host, perhaps a client machine +# outside the Hadoop cluster. +$ impala-shell -i <var class="keyword varname">some.other.hostname</var> + +# When you are logged onto a different host, and impalad is listening +# on a non-default port. Perhaps a load balancer is forwarding requests +# to a different host/port combination behind the scenes. +$ impala-shell -i <var class="keyword varname">some.other.hostname</var>:<var class="keyword varname">port_number</var> +</code></pre> + </li> + </ol> + + <p class="p"> + <strong class="ph b">To connect the Impala shell after shell startup:</strong> + </p> + + <ol class="ol"> + <li class="li"> + Start the Impala shell with no connection: +<pre class="pre codeblock"><code>$ impala-shell</code></pre> + <p class="p"> + You should see a prompt like the following: + </p> +<pre class="pre codeblock"><code>Welcome to the Impala shell. Press TAB twice to see a list of available commands. +... +<span class="ph">(Shell + build version: Impala Shell v2.8.x (<var class="keyword varname">hash</var>) built on + <var class="keyword varname">date</var>)</span> +[Not connected] > </code></pre> + </li> + + <li class="li"> + Locate the hostname of a DataNode within the cluster that is running an instance of the + <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something + other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the + port number also. + </li> + + <li class="li"> + Use the <code class="ph codeph">connect</code> command to connect to an Impala instance. Enter a command of the form: +<pre class="pre codeblock"><code>[Not connected] > connect <var class="keyword varname">impalad-host</var> +[<var class="keyword varname">impalad-host</var>:21000] ></code></pre> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Replace <var class="keyword varname">impalad-host</var> with the hostname you have configured for any DataNode running + Impala in your environment. The changed prompt indicates a successful connection. + </div> + </li> + </ol> + + <p class="p"> + <strong class="ph b">To start <span class="keyword cmdname">impala-shell</span> in a specific database:</strong> + </p> + + <p class="p"> + You can use all the same connection options as in previous examples. + For simplicity, these examples assume that you are logged into one of + the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon. + </p> + + <ol class="ol"> + <li class="li"> + Find the name of the database containing the relevant tables, views, and so + on that you want to operate on. + </li> + + <li class="li"> + Use the <code class="ph codeph">-d</code> option to the + <span class="keyword cmdname">impala-shell</span> interpreter to connect and immediately + switch to the specified database, without the need for a <code class="ph codeph">USE</code> + statement or fully qualified names: +<pre class="pre codeblock"><code> +# Subsequent queries with unqualified names operate on +# tables, views, and so on inside the database named 'staging'. +$ impala-shell -i localhost -d staging + +# It is common during development, ETL, benchmarking, and so on +# to have different databases containing the same table names +# but with different contents or layouts. +$ impala-shell -i localhost -d parquet_snappy_compression +$ impala-shell -i localhost -d parquet_gzip_compression +</code></pre> + </li> + </ol> + + <p class="p"> + <strong class="ph b">To run one or several statements in non-interactive mode:</strong> + </p> + + <p class="p"> + You can use all the same connection options as in previous examples. + For simplicity, these examples assume that you are logged into one of + the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon. + </p> + + <ol class="ol"> + <li class="li"> + Construct a statement, or a file containing a sequence of statements, + that you want to run in an automated way, without typing or copying + and pasting each time. + </li> + + <li class="li"> + Invoke <span class="keyword cmdname">impala-shell</span> with the <code class="ph codeph">-q</code> option to run a single statement, or + the <code class="ph codeph">-f</code> option to run a sequence of statements from a file. + The <span class="keyword cmdname">impala-shell</span> command returns immediately, without going into + the interactive interpreter. +<pre class="pre codeblock"><code> +# A utility command that you might run while developing shell scripts +# to manipulate HDFS files. +$ impala-shell -i localhost -d database_of_interest -q 'show tables' + +# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements +# can go into a file to make the setup process repeatable. +$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql +</code></pre> + </li> + </ol> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_conversion_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_conversion_functions.html b/docs/build/html/topics/impala_conversion_functions.html new file mode 100644 index 0000000..d49cef8 --- /dev/null +++ b/docs/build/html/topics/impala_conversion_functions.html @@ -0,0 +1,288 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conversion_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Type Conversion Functions</title></head><body id="conversion_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Type Conversion Functions</h1> + + + + <div class="body conbody"> + + <p class="p"> + Conversion functions are usually used in combination with other functions, to explicitly pass the expected + data types. Impala has strict rules regarding data types for function parameters. For example, Impala does + not automatically convert a <code class="ph codeph">DOUBLE</code> value to <code class="ph codeph">FLOAT</code>, a + <code class="ph codeph">BIGINT</code> value to <code class="ph codeph">INT</code>, or other conversion where precision could be lost or + overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts, + you might frequently need to convert values to or from the <code class="ph codeph">STRING</code> type. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Although in <span class="keyword">Impala 2.3</span>, the <code class="ph codeph">SHOW FUNCTIONS</code> output for + database <code class="ph codeph">_IMPALA_BUILTINS</code> contains some function signatures + matching the pattern <code class="ph codeph">castto*</code>, these functions are not intended + for public use and are expected to be hidden in future. + </div> + + <p class="p"> + <strong class="ph b">Function reference:</strong> + </p> + + <p class="p"> + Impala supports the following type conversion functions: + </p> + +<dl class="dl"> + + +<dt class="dt dlterm" id="conversion_functions__cast"> +<code class="ph codeph">cast(<var class="keyword varname">expr</var> AS <var class="keyword varname">type</var>)</code> +</dt> + +<dd class="dd"> + +<strong class="ph b">Purpose:</strong> Converts the value of an expression to any other type. +If the expression value is of a type that cannot be converted to the target type, the result is <code class="ph codeph">NULL</code>. +<p class="p"><strong class="ph b">Usage notes:</strong> +Use <code class="ph codeph">CAST</code> when passing a column value or literal to a function that +expects a parameter with a different type. +Frequently used in SQL operations such as <code class="ph codeph">CREATE TABLE AS SELECT</code> +and <code class="ph codeph">INSERT ... VALUES</code> to ensure that values from various sources +are of the appropriate type for the destination columns. +Where practical, do a one-time <code class="ph codeph">CAST()</code> operation during the ingestion process +to make each column into the appropriate type, rather than using many <code class="ph codeph">CAST()</code> +operations in each query; doing type conversions for each row during each query can be expensive +for tables with millions or billions of rows. +</p> + <p class="p"> + The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code> + values is affected by the <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> startup flag for the + <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about + how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type. + </p> + +<p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<pre class="pre codeblock"><code>select concat('Here are the first ',10,' results.'); -- Fails +select concat('Here are the first ',cast(10 as string),' results.'); -- Succeeds +</code></pre> +<p class="p"> +The following example starts with a text table where every column has a type of <code class="ph codeph">STRING</code>, +which might be how you ingest data of unknown schema until you can verify the cleanliness of the underly values. +Then it uses <code class="ph codeph">CAST()</code> to create a new Parquet table with the same data, but using specific +numeric data types for the columns with numeric data. Using numeric types of appropriate sizes can result in +substantial space savings on disk and in memory, and performance improvements in queries, +over using strings or larger-than-necessary numeric types. +</p> +<pre class="pre codeblock"><code>create table t1 (name string, x string, y string, z string); + +create table t2 stored as parquet +as select + name, + cast(x as bigint) x, + cast(y as timestamp) y, + cast(z as smallint) z +from t1; + +describe t2; ++------+----------+---------+ +| name | type | comment | ++------+----------+---------+ +| name | string | | +| x | bigint | | +| y | smallint | | +| z | tinyint | | ++------+----------+---------+ +</code></pre> +<p class="p"> + <strong class="ph b">Related information:</strong> + </p> +<p class="p"> + + For details of casts from each kind of data type, see the description of + the appropriate type: + <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, + <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, + <a class="xref" href="impala_int.html#int">INT Data Type</a>, + <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, + <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>, + <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>, + <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>, + <a class="xref" href="impala_string.html#string">STRING Data Type</a>, + <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>, + <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, + <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, + <a class="xref" href="impala_boolean.html#boolean">BOOLEAN Data Type</a> +</p> +</dd> + + + + + + + + + + + + + + + + + + + + + + + + + + + +<dt class="dt dlterm" id="conversion_functions__typeof"> +<code class="ph codeph">typeof(type value)</code> +</dt> +<dd class="dd"> + +<strong class="ph b">Purpose:</strong> Returns the name of the data type corresponding to an expression. For types with +extra attributes, such as length for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>, +or precision and scale for <code class="ph codeph">DECIMAL</code>, includes the full specification of the type. + +<p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p> +<p class="p"><strong class="ph b">Usage notes:</strong> Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as <code class="ph codeph">CREATE TABLE</code> statements. +For example, previously, to understand the type of an expression such as +<code class="ph codeph">col1 / col2</code> or <code class="ph codeph">concat(col1, col2, col3)</code>, +you might have created a dummy table with a single row, using syntax such as <code class="ph codeph">CREATE TABLE foo AS SELECT 5 / 3.0</code>, +and then doing a <code class="ph codeph">DESCRIBE</code> to see the type of the row. +Or you might have done a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation to create a table and +copy data into it, only learning the types of the columns by doing a <code class="ph codeph">DESCRIBE</code> afterward. +This technique is especially useful for arithmetic expressions involving <code class="ph codeph">DECIMAL</code> types, +because the precision and scale of the result is typically different than that of the operands. +</p> +<p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span> + </p> +<p class="p"> + <strong class="ph b">Examples:</strong> + </p> +<p class="p"> +These examples show how to check the type of a simple literal or function value. +Notice how adding even tiny integers together changes the data type of the result to +avoid overflow, and how the results of arithmetic operations on <code class="ph codeph">DECIMAL</code> values +have specific precision and scale attributes. +</p> +<pre class="pre codeblock"><code>select typeof(2) ++-----------+ +| typeof(2) | ++-----------+ +| TINYINT | ++-----------+ + +select typeof(2+2) ++---------------+ +| typeof(2 + 2) | ++---------------+ +| SMALLINT | ++---------------+ + +select typeof('xyz') ++---------------+ +| typeof('xyz') | ++---------------+ +| STRING | ++---------------+ + +select typeof(now()) ++---------------+ +| typeof(now()) | ++---------------+ +| TIMESTAMP | ++---------------+ + +select typeof(5.3 / 2.1) ++-------------------+ +| typeof(5.3 / 2.1) | ++-------------------+ +| DECIMAL(6,4) | ++-------------------+ + +select typeof(5.30001 / 2342.1); ++--------------------------+ +| typeof(5.30001 / 2342.1) | ++--------------------------+ +| DECIMAL(13,11) | ++--------------------------+ + +select typeof(typeof(2+2)) ++-----------------------+ +| typeof(typeof(2 + 2)) | ++-----------------------+ +| STRING | ++-----------------------+ +</code></pre> + +<p class="p"> +This example shows how even if you do not have a record of the type of a column, +for example because the type was changed by <code class="ph codeph">ALTER TABLE</code> after the +original <code class="ph codeph">CREATE TABLE</code>, you can still find out the type in a +more compact form than examining the full <code class="ph codeph">DESCRIBE</code> output. +Remember to use <code class="ph codeph">LIMIT 1</code> in such cases, to avoid an identical +result value for every row in the table. +</p> +<pre class="pre codeblock"><code>create table typeof_example (a int, b tinyint, c smallint, d bigint); + +/* Empty result set if there is no data in the table. */ +select typeof(a) from typeof_example; + +/* OK, now we have some data but the type of column A is being changed. */ +insert into typeof_example values (1, 2, 3, 4); +alter table typeof_example change a a bigint; + +/* We can always find out the current type of that column without doing a full DESCRIBE. */ +select typeof(a) from typeof_example limit 1; ++-----------+ +| typeof(a) | ++-----------+ +| BIGINT | ++-----------+ +</code></pre> +<p class="p"> +This example shows how you might programmatically generate a <code class="ph codeph">CREATE TABLE</code> statement +with the appropriate column definitions to hold the result values of arbitrary expressions. +The <code class="ph codeph">typeof()</code> function lets you construct a detailed <code class="ph codeph">CREATE TABLE</code> statement +without actually creating the table, as opposed to <code class="ph codeph">CREATE TABLE AS SELECT</code> operations +where you create the destination table but only learn the column data types afterward through <code class="ph codeph">DESCRIBE</code>. +</p> +<pre class="pre codeblock"><code>describe typeof_example; ++------+----------+---------+ +| name | type | comment | ++------+----------+---------+ +| a | bigint | | +| b | tinyint | | +| c | smallint | | +| d | bigint | | ++------+----------+---------+ + +/* An ETL or business intelligence tool might create variations on a table with different file formats, + different sets of columns, and so on. TYPEOF() lets an application introspect the types of the original columns. */ +select concat('create table derived_table (a ', typeof(a), ', b ', typeof(b), ', c ', + typeof(c), ', d ', typeof(d), ') stored as parquet;') + as 'create table statement' +from typeof_example limit 1; ++-------------------------------------------------------------------------------------------+ +| create table statement | ++-------------------------------------------------------------------------------------------+ +| create table derived_table (a BIGINT, b TINYINT, c SMALLINT, d BIGINT) stored as parquet; | ++-------------------------------------------------------------------------------------------+ +</code></pre> +</dd> + + +</dl> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_count.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_count.html b/docs/build/html/topics/impala_count.html new file mode 100644 index 0000000..c1b961a --- /dev/null +++ b/docs/build/html/topics/impala_count.html @@ -0,0 +1,353 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="count"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COUNT Function</title></head><body id="count"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">COUNT Function</h1> + + + + <div class="body conbody"> + + <p class="p"> + + An aggregate function that returns the number of rows, or the number of non-<code class="ph codeph">NULL</code> rows. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>COUNT([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre> + + <p class="p"> + Depending on the argument, <code class="ph codeph">COUNT()</code> considers rows that meet certain conditions: + </p> + + <ul class="ul"> + <li class="li"> + The notation <code class="ph codeph">COUNT(*)</code> includes <code class="ph codeph">NULL</code> values in the total. + </li> + + <li class="li"> + The notation <code class="ph codeph">COUNT(<var class="keyword varname">column_name</var>)</code> only considers rows where the column + contains a non-<code class="ph codeph">NULL</code> value. + </li> + + <li class="li"> + You can also combine <code class="ph codeph">COUNT</code> with the <code class="ph codeph">DISTINCT</code> operator to eliminate + duplicates before counting, and to count the combinations of values across multiple columns. + </li> + </ul> + + <p class="p"> + When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of + grouping values. + </p> + + <p class="p"> + <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code> + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and + <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> + query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher. + See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> + for the kinds of queries that this option applies to, and slight differences in how partitions are + evaluated when this query option is enabled. + </p> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>) + in an aggregation function, you unpack the individual elements using join notation in the query, + and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column. + See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala. + </p> + + <div class="p"> +The following example demonstrates calls to several aggregation functions +using values from a column containing nested complex types +(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items). +The array is unpacked inside the query using join notation. +The array elements are referenced using the <code class="ph codeph">ITEM</code> +pseudocolumn, and the structure fields inside the array elements +are referenced using dot notation. +Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code> +are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and +the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> +values are computed from the string <code class="ph codeph">N_NAME</code> field. +<pre class="pre codeblock"><code>describe region; ++-------------+-------------------------+---------+ +| name | type | comment | ++-------------+-------------------------+---------+ +| r_regionkey | smallint | | +| r_name | string | | +| r_comment | string | | +| r_nations | array<struct< | | +| | n_nationkey:smallint, | | +| | n_name:string, | | +| | n_comment:string | | +| | >> | | ++-------------+-------------------------+---------+ + +select r_name, r_nations.item.n_nationkey + from region, region.r_nations as r_nations +order by r_name, r_nations.item.n_nationkey; ++-------------+------------------+ +| r_name | item.n_nationkey | ++-------------+------------------+ +| AFRICA | 0 | +| AFRICA | 5 | +| AFRICA | 14 | +| AFRICA | 15 | +| AFRICA | 16 | +| AMERICA | 1 | +| AMERICA | 2 | +| AMERICA | 3 | +| AMERICA | 17 | +| AMERICA | 24 | +| ASIA | 8 | +| ASIA | 9 | +| ASIA | 12 | +| ASIA | 18 | +| ASIA | 21 | +| EUROPE | 6 | +| EUROPE | 7 | +| EUROPE | 19 | +| EUROPE | 22 | +| EUROPE | 23 | +| MIDDLE EAST | 4 | +| MIDDLE EAST | 10 | +| MIDDLE EAST | 11 | +| MIDDLE EAST | 13 | +| MIDDLE EAST | 20 | ++-------------+------------------+ + +select + r_name, + count(r_nations.item.n_nationkey) as count, + sum(r_nations.item.n_nationkey) as sum, + avg(r_nations.item.n_nationkey) as avg, + min(r_nations.item.n_name) as minimum, + max(r_nations.item.n_name) as maximum, + ndv(r_nations.item.n_nationkey) as distinct_vals +from + region, region.r_nations as r_nations +group by r_name +order by r_name; ++-------------+-------+-----+------+-----------+----------------+---------------+ +| r_name | count | sum | avg | minimum | maximum | distinct_vals | ++-------------+-------+-----+------+-----------+----------------+---------------+ +| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 | +| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 | +| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 | +| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 | +| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 | ++-------------+-------+-----+------+-----------+----------------+---------------+ +</code></pre> +</div> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + +<pre class="pre codeblock"><code>-- How many rows total are in the table, regardless of NULL values? +select count(*) from t1; +-- How many rows are in the table with non-NULL values for a column? +select count(c1) from t1; +-- Count the rows that meet certain conditions. +-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col). +select count(*) from t1 where x > 10; +select count(c1) from t1 where x > 10; +-- Can also be used in combination with DISTINCT and/or GROUP BY. +-- Combine COUNT and DISTINCT to find the number of unique values. +-- Must use column names rather than * with COUNT(DISTINCT ...) syntax. +-- Rows with NULL values are not counted. +select count(distinct c1) from t1; +-- Rows with a NULL value in _either_ column are not counted. +select count(distinct c1, c2) from t1; +-- Return more than one result. +select month, year, count(distinct visitor_id) from web_stats group by month, year; +</code></pre> + + <div class="p"> + The following examples show how to use <code class="ph codeph">COUNT()</code> in an analytic context. They use a table + containing integers from 1 to 10. Notice how the <code class="ph codeph">COUNT()</code> is reported for each input value, as + opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set. +<pre class="pre codeblock"><code>select x, property, count(x) over (partition by property) as count from int_t where property in ('odd','even'); ++----+----------+-------+ +| x | property | count | ++----+----------+-------+ +| 2 | even | 5 | +| 4 | even | 5 | +| 6 | even | 5 | +| 8 | even | 5 | +| 10 | even | 5 | +| 1 | odd | 5 | +| 3 | odd | 5 | +| 5 | odd | 5 | +| 7 | odd | 5 | +| 9 | odd | 5 | ++----+----------+-------+ +</code></pre> + +Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving +set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">COUNT()</code> in an analytic context +(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running count of all the even values, +then a running count of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly +activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, +therefore all of these examples produce the same results: +<pre class="pre codeblock"><code>select x, property, + count(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative count' + from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative count | ++----+----------+------------------+ +| 2 | even | 1 | +| 4 | even | 2 | +| 6 | even | 3 | +| 8 | even | 4 | +| 10 | even | 5 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+------------------+ + +select x, property, + count(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">range between unbounded preceding and current row</strong> + ) as 'cumulative total' +from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative count | ++----+----------+------------------+ +| 2 | even | 1 | +| 4 | even | 2 | +| 6 | even | 3 | +| 8 | even | 4 | +| 10 | even | 5 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+------------------+ + +select x, property, + count(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">rows between unbounded preceding and current row</strong> + ) as 'cumulative total' + from int_t where property in ('odd','even'); ++----+----------+------------------+ +| x | property | cumulative count | ++----+----------+------------------+ +| 2 | even | 1 | +| 4 | even | 2 | +| 6 | even | 3 | +| 8 | even | 4 | +| 10 | even | 5 | +| 1 | odd | 1 | +| 3 | odd | 2 | +| 5 | odd | 3 | +| 7 | odd | 4 | +| 9 | odd | 5 | ++----+----------+------------------+ +</code></pre> + +The following examples show how to construct a moving window, with a running count taking into account 1 row before +and 1 row after the current row, within the same partition (all the even values or all the odd values). +Therefore, the count is consistently 3 for rows in the middle of the window, and 2 for +rows near the ends of the window, where there is no preceding or no following row in the partition. +Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of +moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> +clause: +<pre class="pre codeblock"><code>select x, property, + count(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">rows between 1 preceding and 1 following</strong> + ) as 'moving total' + from int_t where property in ('odd','even'); ++----+----------+--------------+ +| x | property | moving total | ++----+----------+--------------+ +| 2 | even | 2 | +| 4 | even | 3 | +| 6 | even | 3 | +| 8 | even | 3 | +| 10 | even | 2 | +| 1 | odd | 2 | +| 3 | odd | 3 | +| 5 | odd | 3 | +| 7 | odd | 3 | +| 9 | odd | 2 | ++----+----------+--------------+ + +-- Doesn't work because of syntax restriction on RANGE clause. +select x, property, + count(x) over + ( + partition by property + <strong class="ph b">order by x</strong> + <strong class="ph b">range between 1 preceding and 1 following</strong> + ) as 'moving total' +from int_t where property in ('odd','even'); +ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW. +</code></pre> + </div> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code> + expression in each query. + </p> + <p class="p"> + If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by + specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of + <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite + <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the + <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option. + </p> + <p class="p"> + To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the + following technique for queries involving a single table: + </p> +<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from + (select count(distinct col1) as c1 from t1) v1 + cross join + (select count(distinct col2) as c1 from t1) v2; +</code></pre> + <p class="p"> + Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code> + technique wherever practical. + </p> + </div> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_database.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_create_database.html b/docs/build/html/topics/impala_create_database.html new file mode 100644 index 0000000..bfee172 --- /dev/null +++ b/docs/build/html/topics/impala_create_database.html @@ -0,0 +1,209 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE DATABASE Statement</title></head><body id="create_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">CREATE DATABASE Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Creates a new database. + </p> + + <p class="p"> + In Impala, a database is both: + </p> + + <ul class="ul"> + <li class="li"> + A logical construct for grouping together related tables, views, and functions within their own namespace. + You might use a separate database for each application, set of related tables, or round of experimentation. + </li> + + <li class="li"> + A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and + data files are all located under this directory. You can perform HDFS-level operations such as backing it up and measuring space usage, + or remove it with a <code class="ph codeph">DROP DATABASE</code> statement. + </li> + </ul> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] <var class="keyword varname">database_name</var>[COMMENT '<var class="keyword varname">database_comment</var>'] + [LOCATION <var class="keyword varname">hdfs_path</var>];</code></pre> + + <p class="p"> + <strong class="ph b">Statement type:</strong> DDL + </p> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + A database is physically represented as a directory in HDFS, with a filename extension <code class="ph codeph">.db</code>, + under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you. + All databases and their associated directories are top-level objects, with no physical or logical nesting. + </p> + + <p class="p"> + After creating a database, to make it the current database within an <span class="keyword cmdname">impala-shell</span> session, + use the <code class="ph codeph">USE</code> statement. You can refer to tables in the current database without prepending + any qualifier to their names. + </p> + + <p class="p"> + When you first connect to Impala through <span class="keyword cmdname">impala-shell</span>, the database you start in (before + issuing any <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">USE</code> statements) is named + <code class="ph codeph">default</code>. + </p> + + <div class="p"> + Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location + for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in + functions, use a statement like the following: +<pre class="pre codeblock"><code>show functions in _impala_builtins; +show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*'; +</code></pre> + </div> + + <p class="p"> + After creating a database, your <span class="keyword cmdname">impala-shell</span> session or another + <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately access that database. To access + the database through the Impala daemon on a different node, issue the <code class="ph codeph">INVALIDATE METADATA</code> + statement first while connected to that other node. + </p> + + <p class="p"> + Setting the <code class="ph codeph">LOCATION</code> attribute for a new database is a way to work with sets of files in an + HDFS directory structure outside the default Impala data directory, as opposed to setting the + <code class="ph codeph">LOCATION</code> attribute for each individual table. + </p> + + <p class="p"> + If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for + load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL + statement wait before returning, until the new or changed metadata has been received by all the Impala + nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Hive considerations:</strong> + </p> + + <p class="p"> + When you create a database in Impala, the database can also be used by Hive. + When you create a database in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code> + statement in Impala to make Impala permanently aware of the new database. + </p> + + <p class="p"> + The <code class="ph codeph">SHOW DATABASES</code> statement lists all databases, or the databases whose name + matches a wildcard pattern. <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the + <code class="ph codeph">SHOW DATABASES</code> output includes a second column that displays the associated + comment, if any, for each database.</span> + </p> + + <p class="p"> + <strong class="ph b">Amazon S3 considerations:</strong> + </p> + + <p class="p"> + To specify that any tables created within a database reside on the Amazon S3 system, + you can include an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code> + attribute. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically creates any + required folders as the databases, tables, and partitions are created, and removes + them when they are dropped. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as + <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>, + <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders + as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point + Impala database, tables, or partitions at them, and manually remove folders when no longer needed. + See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala. + </p> + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Cannot be cancelled. + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> + </p> + <p class="p"> + The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, + typically the <code class="ph codeph">impala</code> user, must have write + permission for the parent HDFS directory under which the database + is located. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <pre class="pre codeblock"><code>create database first_db; +use first_db; +create table t1 (x int); + +create database second_db; +use second_db; +-- Each database has its own namespace for tables. +-- You can reuse the same table names in each database. +create table t1 (s string); + +create database temp; + +-- You can either USE a database after creating it, +-- or qualify all references to the table name with the name of the database. +-- Here, tables T2 and T3 are both created in the TEMP database. + +create table temp.t2 (x int, y int); +use database temp; +create table t3 (s string); + +-- You cannot drop a database while it is selected by the USE statement. +drop database temp; +<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em> + +-- The always-available database 'default' is a convenient one to USE +-- before dropping a database you created. +use default; + +-- Before dropping a database, first drop all the tables inside it, +<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span> +drop database temp; +ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore: +CAUSED BY: InvalidOperationException: Database temp is not empty +show tables in temp; ++------+ +| name | ++------+ +| t3 | ++------+ + +<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span> +<span class="ph">drop database temp cascade;</span> + +-- Earlier releases: +drop table temp.t3; +drop database temp; +</code></pre> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, + <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>, + <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> \ No newline at end of file
