http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_proxy.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_proxy.html b/docs/build/html/topics/impala_proxy.html new file mode 100644 index 0000000..d29dfc6 --- /dev/null +++ b/docs/build/html/topics/impala_proxy.html @@ -0,0 +1,396 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="proxy"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala through a Proxy for High Availability</title></head><body id="proxy"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Using Impala through a Proxy for High Availability</h1> + + + + <div class="body conbody"> + + <p class="p"> + For most clusters that have multiple users and production availability requirements, you might set up a proxy + server to relay requests to and from Impala. + </p> + + <p class="p"> + Currently, the Impala statestore mechanism does not include such proxying and load-balancing features. Set up + a software package of your choice to perform these functions. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon. + The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special + requirements for high availability, because problems with those daemons do not result in data loss. + If those daemons become unavailable due to an outage on a particular + host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and + <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the + Impala service. + </p> + </div> + + <p class="p toc inpage"></p> + + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="proxy__proxy_overview"> + + <h2 class="title topictitle2" id="ariaid-title2">Overview of Proxy Usage and Load Balancing for Impala</h2> + + + <div class="body conbody"> + + <p class="p"> + Using a load-balancing proxy server for Impala has the following advantages: + </p> + + <ul class="ul"> + <li class="li"> + Applications connect to a single well-known host and port, rather than keeping track of the hosts where + the <span class="keyword cmdname">impalad</span> daemon is running. + </li> + + <li class="li"> + If any host running the <span class="keyword cmdname">impalad</span> daemon becomes unavailable, application connection + requests still succeed because you always connect to the proxy server rather than a specific host running + the <span class="keyword cmdname">impalad</span> daemon. + </li> + + <li class="li"> + The coordinator node for each Impala query potentially requires more memory and CPU cycles than the other + nodes that process the query. The proxy server can issue queries using round-robin scheduling, so that + each connection uses a different coordinator node. This load-balancing technique lets the Impala nodes + share this additional work, rather than concentrating it on a single machine. + </li> + </ul> + + <p class="p"> + The following setup steps are a general outline that apply to any load-balancing proxy software: + </p> + + <ol class="ol"> + <li class="li"> + Download the load-balancing proxy software. It should only need to be installed and configured on a + single host. Pick a host other than the DataNodes where <span class="keyword cmdname">impalad</span> is running, + because the intention is to protect against the possibility of one or more of these DataNodes becoming unavailable. + </li> + + <li class="li"> + Configure the load balancer (typically by editing a configuration file). + In particular: + <ul class="ul"> + <li class="li"> + <p class="p"> + Set up a port that the load balancer will listen on to relay Impala requests back and forth. + </p> + </li> + <li class="li"> + <p class="p"> + Consider enabling <span class="q">"sticky sessions"</span>. Where practical, enable this setting + so that stateless client applications such as <span class="keyword cmdname">impalad</span> and Hue + are not disconnected from long-running queries. Evaluate whether this setting is + appropriate for your combination of workload and client applications. + </p> + </li> + <li class="li"> + <p class="p"> + For Kerberized clusters, follow the instructions in <a class="xref" href="impala_proxy.html#proxy_kerberos">Special Proxy Considerations for Clusters Using Kerberos</a>. + </p> + </li> + </ul> + </li> + + <li class="li"> + Specify the host and port settings for each Impala node. These are the hosts that the load balancer will + choose from when relaying each Impala query. See <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for when to use + port 21000, 21050, or another value depending on what type of connections you are load balancing. + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + In particular, if you are using Hue or JDBC-based applications, + you typically set up load balancing for both ports 21000 and 21050, because + these client applications connect through port 21050 while the <span class="keyword cmdname">impala-shell</span> + command connects through port 21000. + </p> + </div> + </li> + + <li class="li"> + Run the load-balancing proxy server, pointing it at the configuration file that you set up. + </li> + + <li class="li"> + For any scripts, jobs, or configuration settings for applications that formerly connected to a specific + datanode to run Impala SQL statements, change the connection information (such as the <code class="ph codeph">-i</code> + option in <span class="keyword cmdname">impala-shell</span>) to point to the load balancer instead. + </li> + </ol> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + The following sections use the HAProxy software as a representative example of a load balancer + that you can use with Impala. + </div> + + </div> + + </article> + + + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="proxy__proxy_kerberos"> + + <h2 class="title topictitle2" id="ariaid-title3">Special Proxy Considerations for Clusters Using Kerberos</h2> + + + <div class="body conbody"> + + <p class="p"> + In a cluster using Kerberos, applications check host credentials to verify that the host they are + connecting to is the same one that is actually processing the request, to prevent man-in-the-middle + attacks. To clarify that the load-balancing proxy server is legitimate, perform these extra Kerberos setup + steps: + </p> + + <ol class="ol"> + <li class="li"> + This section assumes you are starting with a Kerberos-enabled cluster. See + <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for instructions for setting up Impala with Kerberos. See + <span class="xref">the documentation for your Apache Hadoop distribution</span> for general steps to set up Kerberos. + </li> + + <li class="li"> + Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should + already have an entry <code class="ph codeph">impala/<var class="keyword varname">proxy_host</var>@<var class="keyword varname">realm</var></code> in + its keytab. If not, go back over the initial Kerberos configuration steps for the keytab on each host + running the <span class="keyword cmdname">impalad</span> daemon. + </li> + + <li class="li"> + Copy the keytab file from the proxy host to all other hosts in the cluster that run the + <span class="keyword cmdname">impalad</span> daemon. (For optimal performance, <span class="keyword cmdname">impalad</span> should be running + on all DataNodes in the cluster.) Put the keytab file in a secure location on each of these other hosts. + </li> + + <li class="li"> + Add an entry <code class="ph codeph">impala/<var class="keyword varname">actual_hostname</var>@<var class="keyword varname">realm</var></code> to the keytab on each + host running the <span class="keyword cmdname">impalad</span> daemon. + </li> + + <li class="li"> + + For each impalad node, merge the existing keytab with the proxyâs keytab using + <span class="keyword cmdname">ktutil</span>, producing a new keytab file. For example: + <pre class="pre codeblock"><code>$ ktutil + ktutil: read_kt proxy.keytab + ktutil: read_kt impala.keytab + ktutil: write_kt proxy_impala.keytab + ktutil: quit</code></pre> + + </li> + + <li class="li"> + + To verify that the keytabs are merged, run the command: +<pre class="pre codeblock"><code> +klist -k <var class="keyword varname">keytabfile</var> +</code></pre> + which lists the credentials for both <code class="ph codeph">principal</code> and <code class="ph codeph">be_principal</code> on + all nodes. + </li> + + + <li class="li"> + + Make sure that the <code class="ph codeph">impala</code> user has permission to read this merged keytab file. + + </li> + + <li class="li"> + Change the following configuration settings for each host in the cluster that participates + in the load balancing: + <ul class="ul"> + <li class="li"> + In the <span class="keyword cmdname">impalad</span> option definition, add: +<pre class="pre codeblock"><code> +--principal=impala/<em class="ph i">proxy_host@realm</em> + --be_principal=impala/<em class="ph i">actual_host@realm</em> + --keytab_file=<em class="ph i">path_to_merged_keytab</em> +</code></pre> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Every host has different <code class="ph codeph">--be_principal</code> because the actual hostname + is different on each host. + + Specify the fully qualified domain name (FQDN) for the proxy host, not the IP + address. Use the exact FQDN as returned by a reverse DNS lookup for the associated + IP address. + + </div> + </li> + + <li class="li"> + Modify the startup options. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for the procedure to modify the startup + options. + </li> + </ul> + </li> + + <li class="li"> + Restart Impala to make the changes take effect. Restart the <span class="keyword cmdname">impalad</span> daemons on all + hosts in the cluster, as well as the <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> + daemons. + </li> + + </ol> + + + + </div> + + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="proxy__tut_proxy"> + + <h2 class="title topictitle2" id="ariaid-title4">Example of Configuring HAProxy Load Balancer for Impala</h2> + + + <div class="body conbody"> + + <p class="p"> + If you are not already using a load-balancing proxy, you can experiment with + <a class="xref" href="http://haproxy.1wt.eu/" target="_blank">HAProxy</a> a free, open source load + balancer. This example shows how you might install and configure that load balancer on a Red Hat Enterprise + Linux system. + </p> + + <ul class="ul"> + <li class="li"> + <p class="p"> + Install the load balancer: <code class="ph codeph">yum install haproxy</code> + </p> + </li> + + <li class="li"> + <p class="p"> + Set up the configuration file: <span class="ph filepath">/etc/haproxy/haproxy.cfg</span>. See the following section + for a sample configuration file. + </p> + </li> + + <li class="li"> + <p class="p"> + Run the load balancer (on a single host, preferably one not running <span class="keyword cmdname">impalad</span>): + </p> +<pre class="pre codeblock"><code>/usr/sbin/haproxy âf /etc/haproxy/haproxy.cfg</code></pre> + </li> + + <li class="li"> + <p class="p"> + In <span class="keyword cmdname">impala-shell</span>, JDBC applications, or ODBC applications, connect to the listener + port of the proxy host, rather than port 21000 or 21050 on a host actually running <span class="keyword cmdname">impalad</span>. + The sample configuration file sets haproxy to listen on port 25003, therefore you would send all + requests to <code class="ph codeph"><var class="keyword varname">haproxy_host</var>:25003</code>. + </p> + </li> + </ul> + + <p class="p"> + This is the sample <span class="ph filepath">haproxy.cfg</span> used in this example: + </p> + +<pre class="pre codeblock"><code>global + # To have these messages end up in /var/log/haproxy.log you will + # need to: + # + # 1) configure syslog to accept network log events. This is done + # by adding the '-r' option to the SYSLOGD_OPTIONS in + # /etc/sysconfig/syslog + # + # 2) configure local2 events to go to the /var/log/haproxy.log + # file. A line like the following can be added to + # /etc/sysconfig/syslog + # + # local2.* /var/log/haproxy.log + # + log 127.0.0.1 local0 + log 127.0.0.1 local1 notice + chroot /var/lib/haproxy + pidfile /var/run/haproxy.pid + maxconn 4000 + user haproxy + group haproxy + daemon + + # turn on stats unix socket + #stats socket /var/lib/haproxy/stats + +#--------------------------------------------------------------------- +# common defaults that all the 'listen' and 'backend' sections will +# use if not designated in their block +# +# You might need to adjust timing values to prevent timeouts. +#--------------------------------------------------------------------- +defaults + mode http + log global + option httplog + option dontlognull + option http-server-close + option forwardfor except 127.0.0.0/8 + option redispatch + retries 3 + maxconn 3000 + contimeout 5000 + clitimeout 50000 + srvtimeout 50000 + +# +# This sets up the admin page for HA Proxy at port 25002. +# +listen stats :25002 + balance + mode http + stats enable + stats auth <var class="keyword varname">username</var>:<var class="keyword varname">password</var> + +# This is the setup for Impala. Impala client connect to load_balancer_host:25003. +# HAProxy will balance connections among the list of servers listed below. +# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver. +# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000. +listen impala :25003 + mode tcp + option tcplog + balance leastconn + + server <var class="keyword varname">symbolic_name_1</var> impala-host-1.example.com:21000 + server <var class="keyword varname">symbolic_name_2</var> impala-host-2.example.com:21000 + server <var class="keyword varname">symbolic_name_3</var> impala-host-3.example.com:21000 + server <var class="keyword varname">symbolic_name_4</var> impala-host-4.example.com:21000 + +# Setup for Hue or other JDBC-enabled applications. +# In particular, Hue requires sticky sessions. +# The application connects to load_balancer_host:21051, and HAProxy balances +# connections to the associated hosts, where Impala listens for JDBC +# requests on port 21050. +listen impalajdbc :21051 + mode tcp + option tcplog + balance source + server <var class="keyword varname">symbolic_name_5</var> impala-host-1.example.com:21050 + server <var class="keyword varname">symbolic_name_6</var> impala-host-2.example.com:21050 + server <var class="keyword varname">symbolic_name_7</var> impala-host-3.example.com:21050 + server <var class="keyword varname">symbolic_name_8</var> impala-host-4.example.com:21050 +</code></pre> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + If your JDBC or ODBC application connects to Impala through a load balancer such as + <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up + connection timeout values, either check the connection frequently so that it never sits idle longer than + the load balancer timeout value, or check the connection validity before using it and create a new one if + the connection has been closed. + </div> + + </div> + + </article> + +</article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_query_options.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_query_options.html b/docs/build/html/topics/impala_query_options.html new file mode 100644 index 0000000..ee27d90 --- /dev/null +++ b/docs/build/html/topics/impala_query_options.html @@ -0,0 +1,49 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_default_limit_exceeded.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_order_by_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="U RI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_io_buffers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/ impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_reservation_request_timeout.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content=" ../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scan_node_codegen_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_sync_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_v_cpu_cores.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Q uery Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Query Options for the SET Statement</h1> + + + <div class="body conbody"> + + <p class="p"> + You can specify the following options using the <code class="ph codeph">SET</code> statement, and those settings affect all + queries issued from that session. + </p> + + <p class="p"> + Some query options are useful in day-to-day operations for improving usability, performance, or flexibility. + </p> + + <p class="p"> + Other query options control special-purpose aspects of Impala operation and are intended primarily for + advanced debugging or troubleshooting. + </p> + + <p class="p"> + Options with Boolean parameters can be set to 1 or <code class="ph codeph">true</code> to enable, or 0 or <code class="ph codeph">false</code> + to turn off. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + In Impala 2.0 and later, you can set query options directly through the JDBC and ODBC interfaces by using the + <code class="ph codeph">SET</code> statement. Formerly, <code class="ph codeph">SET</code> was only available as a command within the + <span class="keyword cmdname">impala-shell</span> interpreter. + </p> + </div> + + + + <p class="p toc"></p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_set.html#set">SET Statement</a> + </p> + </div> +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_default_limit_exceeded.html">ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="lin k ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_order_by_limit.html">DEFAULT_ORDER_BY_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><str ong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong> <br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_io_buffers.html">MAX_IO_BUFFERS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_nodes.html">NUM_NODES Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query Option</a></strong><br></li><li class="link ulchild link"><strong><a href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a></st rong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_request_pool.html">REQUEST_POOL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_reservation_request_timeout.html">RESERVATION_REQUEST_TIMEOUT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li>< li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scan_node_codegen_threshold.html">SCAN_NODE_CODEGEN_THRESHOLD Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_sync_ddl.html">SYNC_DDL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_v_cpu_cores.html">V_CPU_CORES Query Option</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_query_timeout_s.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_query_timeout_s.html b/docs/build/html/topics/impala_query_timeout_s.html new file mode 100644 index 0000000..0dff374 --- /dev/null +++ b/docs/build/html/topics/impala_query_timeout_s.html @@ -0,0 +1,62 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_timeout_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</title></head><body id="query_timeout_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">QUERY_TIMEOUT_S Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the + timeout value are automatically cancelled. If the system administrator specified the + <code class="ph codeph">--idle_query_timeout</code> startup option, <code class="ph codeph">QUERY_TIMEOUT_S</code> must be smaller than + or equal to the <code class="ph codeph">--idle_query_timeout</code> value. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The timeout clock for queries and sessions only starts ticking when the query or session is idle. + For queries, this means the query has results ready but is waiting for a client to fetch the data. A + query can run for an arbitrary time without triggering a timeout, because the query is computing results + rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent + unclosed queries from consuming resources and taking up slots in the admission count of running queries, + potentially preventing other queries from starting. + </p> + <p class="p"> + For sessions, this means that no query has been submitted for some period of time. + </p> + </div> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>SET QUERY_TIMEOUT_S=<var class="keyword varname">seconds</var>;</code></pre> + + + + <p class="p"> + <strong class="ph b">Type:</strong> numeric + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> 0 (no timeout if <code class="ph codeph">--idle_query_timeout</code> not in effect; otherwise, use + <code class="ph codeph">--idle_query_timeout</code> value) + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span> + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + + <p class="p"> + <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_rcfile.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_rcfile.html b/docs/build/html/topics/impala_rcfile.html new file mode 100644 index 0000000..0e2668d --- /dev/null +++ b/docs/build/html/topics/impala_rcfile.html @@ -0,0 +1,246 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="rcfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the RCFile File Format with Impala Tables</title></head><body id="rcfile"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1> + + + + <div class="body conbody"> + + <p class="p"> + + Impala supports using RCFile data files. + </p> + + <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead"> + <tr class="row"> + <th class="entry nocellnorowborder" id="rcfile__entry__1"> + File Type + </th> + <th class="entry nocellnorowborder" id="rcfile__entry__2"> + Format + </th> + <th class="entry nocellnorowborder" id="rcfile__entry__3"> + Compression Codecs + </th> + <th class="entry nocellnorowborder" id="rcfile__entry__4"> + Impala Can CREATE? + </th> + <th class="entry nocellnorowborder" id="rcfile__entry__5"> + Impala Can INSERT? + </th> + </tr> + </thead><tbody class="tbody"> + <tr class="row"> + <td class="entry nocellnorowborder" headers="rcfile__entry__1 "> + <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a> + </td> + <td class="entry nocellnorowborder" headers="rcfile__entry__2 "> + Structured + </td> + <td class="entry nocellnorowborder" headers="rcfile__entry__3 "> + Snappy, gzip, deflate, bzip2 + </td> + <td class="entry nocellnorowborder" headers="rcfile__entry__4 "> + Yes. + </td> + <td class="entry nocellnorowborder" headers="rcfile__entry__5 "> + No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use + <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala. + </td> + + </tr> + </tbody></table> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile__rcfile_create"> + + <h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2> + + + <div class="body conbody"> + + <p class="p"> + If you do not have an existing data file to use, begin by creating one in the appropriate format. + </p> + + <p class="p"> + <strong class="ph b">To create an RCFile table:</strong> + </p> + + <p class="p"> + In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to: + </p> + +<pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre> + + <p class="p"> + Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of + certain file formats, you might use the Hive shell to load the data. See + <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through + Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> + statement the next time you connect to the Impala node, before querying the table, to make Impala recognize + the new data. + </p> + + <div class="note important note_important"><span class="note__title importanttitle">Important:</span> + See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with + RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive. + </div> + + <p class="p"> + For example, here is how you might create some RCFile tables in Impala (by specifying the columns + explicitly, or cloning the structure of another table), load data through Hive, and query them through + Impala: + </p> + +<pre class="pre codeblock"><code>$ impala-shell -i localhost +[localhost:21000] > create table rcfile_table (x int) stored as rcfile; +[localhost:21000] > create table rcfile_clone like some_other_table stored as rcfile; +[localhost:21000] > quit; + +$ hive +hive> insert into table rcfile_table select x from some_other_table; +3 Rows loaded to rcfile_table +Time taken: 19.015 seconds +hive> quit; + +$ impala-shell -i localhost +[localhost:21000] > select * from rcfile_table; +Returned 0 row(s) in 0.23s +[localhost:21000] > -- Make Impala recognize the data loaded through Hive; +[localhost:21000] > refresh rcfile_table; +[localhost:21000] > select * from rcfile_table; ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | ++---+ +Returned 3 row(s) in 0.23s</code></pre> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + Although you can create tables in this file format using + the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, + and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher, + currently, Impala can query these types only in Parquet tables. + <span class="ph"> + The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types. + Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher. + </span> + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile__rcfile_compression"> + + <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2> + + + <div class="body conbody"> + + <p class="p"> + + You may want to enable compression on existing tables. Enabling compression provides performance gains in + most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify + the following additional settings when loading data through the Hive shell: + </p> + +<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + If you are converting partitioned tables, you must complete additional steps. In such a case, specify + additional settings similar to the following: + </p> + +<pre class="pre codeblock"><code>hive> CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + Remember that Hive does not require that you specify a source format for it. Consider the case of + converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a + Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion, + you would specify settings similar to the following: + </p> + +<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE; +hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre> + + <p class="p"> + To complete a similar process for a table that includes partitions, you would specify settings similar to + the following: + </p> + +<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE; +hive> SET hive.exec.compress.output=true; +hive> SET mapred.max.split.size=256000000; +hive> SET mapred.output.compression.type=BLOCK; +hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The compression type is specified in the following command: + </p> +<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre> + <p class="p"> + You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here. + </p> + </div> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile__rcfile_performance"> + + <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2> + + <div class="body conbody"> + + <p class="p"> + In general, expect query performance with RCFile tables to be + faster than with tables using text data, but slower than with + Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> + for information about using the Parquet file format for + high-performance analytic queries. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3. + For Impala tables that use the file formats Parquet, RCFile, SequenceFile, + Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code> + in the <span class="ph filepath">core-site.xml</span> configuration file determines + how Impala divides the I/O work of reading the data files. This configuration + setting is specified in bytes. By default, this + value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files + as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access + Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> + to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve + Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> + to 268435456 (256 MB) to match the row group size produced by Impala. + </p> + + </div> + </article> + + +</article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_real.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_real.html b/docs/build/html/topics/impala_real.html new file mode 100644 index 0000000..f66d313 --- /dev/null +++ b/docs/build/html/topics/impala_real.html @@ -0,0 +1,39 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="real"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REAL Data Type</title></head><body id="real"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">REAL Data Type</h1> + + + + <div class="body conbody"> + + <p class="p"> + An alias for the <code class="ph codeph">DOUBLE</code> data type. See <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + These examples show how you can use the type names <code class="ph codeph">REAL</code> and <code class="ph codeph">DOUBLE</code> + interchangeably, and behind the scenes Impala treats them always as <code class="ph codeph">DOUBLE</code>. + </p> + +<pre class="pre codeblock"><code>[localhost:21000] > create table r1 (x real); +[localhost:21000] > describe r1; ++------+--------+---------+ +| name | type | comment | ++------+--------+---------+ +| x | double | | ++------+--------+---------+ +[localhost:21000] > insert into r1 values (1.5), (cast (2.2 as double)); +[localhost:21000] > select cast (1e6 as real); ++---------------------------+ +| cast(1000000.0 as double) | ++---------------------------+ +| 1000000 | ++---------------------------+</code></pre> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_refresh.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_refresh.html b/docs/build/html/topics/impala_refresh.html new file mode 100644 index 0000000..75ce520 --- /dev/null +++ b/docs/build/html/topics/impala_refresh.html @@ -0,0 +1,387 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="refresh"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REFRESH Statement</title></head><body id="refresh"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">REFRESH Statement</h1> + + + + <div class="body conbody"> + + <p class="p"> + + To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are + connected through <span class="keyword cmdname">impala-shell</span>, JDBC, or ODBC) must have current metadata about those + databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses + metadata and how it shares the same metastore database as Hive, see + <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information. + </p> + + <p class="p"> + <strong class="ph b">Syntax:</strong> + </p> + +<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]</code></pre> + + <p class="p"> + <strong class="ph b">Usage notes:</strong> + </p> + + <p class="p"> + Use the <code class="ph codeph">REFRESH</code> statement to load the latest metastore metadata and block location data for + a particular table in these scenarios: + </p> + + <ul class="ul"> + <li class="li"> + After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL + pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why + metadata needs to be refreshed.) + </li> + + <li class="li"> + After issuing <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or other + table-modifying SQL statement in Hive. + </li> + </ul> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + In <span class="keyword">Impala 2.3</span> and higher, the syntax <code class="ph codeph">ALTER TABLE <var class="keyword varname">table_name</var> RECOVER PARTITIONS</code> + is a faster alternative to <code class="ph codeph">REFRESH</code> when the only change to the table data is the addition of + new partition directories through Hive or manual HDFS operations. + See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for details. + </p> + </div> + + <p class="p"> + You only need to issue the <code class="ph codeph">REFRESH</code> statement on the node to which you connect to issue + queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read + requests for the correct HDFS blocks without relying on the metadata on the other nodes. + </p> + + <p class="p"> + <code class="ph codeph">REFRESH</code> reloads the metadata for the table from the metastore database, and does an + incremental reload of the low-level block location data to account for any new data files added to the HDFS + data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common + scenario where new data files are added to HDFS. + </p> + + <p class="p"> + Only the metadata for the specified table is flushed. The table must already exist and be known to Impala, + either because the <code class="ph codeph">CREATE TABLE</code> statement was run in Impala rather than Hive, or because a + previous <code class="ph codeph">INVALIDATE METADATA</code> statement caused Impala to reload its entire metadata catalog. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The catalog service broadcasts any changed metadata as a result of Impala + <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code> statements to all + Impala nodes. Thus, the <code class="ph codeph">REFRESH</code> statement is only required if you load data through Hive + or by manipulating data files in HDFS directly. See <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for + more information on the catalog service. + </p> + <p class="p"> + Another way to avoid inconsistency across nodes is to enable the + <code class="ph codeph">SYNC_DDL</code> query option before performing a DDL statement or an <code class="ph codeph">INSERT</code> or + <code class="ph codeph">LOAD DATA</code>. + </p> + <p class="p"> + The table name is a required parameter. To flush the metadata for all tables, use the + <code class="ph codeph"><a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a></code> + command. + </p> + <p class="p"> + Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current + Impala node is already aware of, when you create a new table in the Hive shell, enter + <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in + <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH + <var class="keyword varname">table_name</var></code> after you add data files for that table. + </p> + </div> + + <p class="p"> + <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE + METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the + metadata for the table, which can be an expensive operation, especially for large tables with many + partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location + data for newly added data files, making it a less expensive operation overall. If data was altered in some + more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE + METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0, + the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code> + statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding + new data files to an existing table, thus the table name argument is now required. + </p> + + <p class="p"> + A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if: + </p> + + <ul class="ul"> + <li class="li"> + A metadata change occurs. + </li> + + <li class="li"> + <strong class="ph b">and</strong> the change is made through Hive. + </li> + + <li class="li"> + <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly + connect. + </li> + </ul> + + <p class="p"> + A metadata update for an Impala node is <strong class="ph b">not</strong> required after you run <code class="ph codeph">ALTER TABLE</code>, + <code class="ph codeph">INSERT</code>, or other table-modifying statement in Impala rather than Hive. Impala handles the + metadata synchronization automatically through the catalog service. + </p> + + <p class="p"> + Database and table metadata is typically modified by: + </p> + + <ul class="ul"> + <li class="li"> + Hive - through <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or + <code class="ph codeph">INSERT</code> operations. + </li> + + <li class="li"> + Impalad - through <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code> + operations. <span class="ph">Such changes are propagated to all Impala nodes by the + Impala catalog service.</span> + </li> + </ul> + + <p class="p"> + <code class="ph codeph">REFRESH</code> causes the metadata for that table to be immediately reloaded. For a huge table, + that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable + delay later, for example if the next reference to the table is during a benchmark test. + </p> + + <p class="p"> + <strong class="ph b">Refreshing a single partition:</strong> + </p> + + <p class="p"> + In <span class="keyword">Impala 2.7</span> and higher, the <code class="ph codeph">REFRESH</code> statement can apply to a single partition at a time, + rather than the whole table. Include the optional <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code> + clause and specify values for each of the partition key columns. + </p> + + <p class="p"> + The following examples show how to make Impala aware of data added to a single partition, after data is loaded into + a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that + Impala created and is already aware of, or a new partition created through Hive. + </p> + +<pre class="pre codeblock"><code> +impala> create table p (x int) partitioned by (y int); +impala> insert into p (x,y) values (1,2), (2,2), (2,1); +impala> show partitions p; ++-------+-------+--------+------+... +| y | #Rows | #Files | Size |... ++-------+-------+--------+------+... +| 1 | -1 | 1 | 2B |... +| 2 | -1 | 1 | 4B |... +| Total | -1 | 2 | 6B |... ++-------+-------+--------+------+... + +-- ... Data is inserted into one of the partitions by some external mechanism ... +beeline> insert into p partition (y = 1) values(1000); + +impala> refresh p partition (y=1); +impala> select x from p where y=1; ++------+ +| x | ++------+ +| 2 | <- Original data created by Impala +| 1000 | <- Additional data inserted through Beeline ++------+ + +</code></pre> + + <p class="p"> + The same applies for tables with more than one partition key column. + The <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">REFRESH</code> + statement must include all the partition key columns. + </p> + +<pre class="pre codeblock"><code> +impala> create table p2 (x int) partitioned by (y int, z int); +impala> insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3); +impala> show partitions p2; ++-------+---+-------+--------+------+... +| y | z | #Rows | #Files | Size |... ++-------+---+-------+--------+------+... +| 0 | 0 | -1 | 1 | 2B |... +| 2 | 3 | -1 | 1 | 4B |... +| Total | | -1 | 2 | 6B |... ++-------+---+-------+--------+------+... + +-- ... Data is inserted into one of the partitions by some external mechanism ... +beeline> insert into p2 partition (y = 2, z = 3) values(1000); + +impala> refresh p2 partition (y=2, z=3); +impala> select x from p where y=2 and z = 3; ++------+ +| x | ++------+ +| 1 | <- Original data created by Impala +| 2 | <- Original data created by Impala +| 1000 | <- Additional data inserted through Beeline ++------+ + +</code></pre> + + <p class="p"> + The following examples show how specifying a nonexistent partition does not cause any error, + and the order of the partition key columns does not have to match the column order in the table. + The partition spec must include all the partition key columns; specifying an incomplete set of + columns does cause an error. + </p> + +<pre class="pre codeblock"><code> +-- Partition doesn't exist. +refresh p2 partition (y=0, z=3); +refresh p2 partition (y=0, z=-1) +-- Key columns specified in a different order than the table definition. +refresh p2 partition (z=1, y=0) +-- Incomplete partition spec causes an error. +refresh p2 partition (y=0) +ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2) + +</code></pre> + + <p class="p"> + If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for + load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL + statement wait before returning, until the new or changed metadata has been received by all the Impala + nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details. + </p> + + <p class="p"> + <strong class="ph b">Examples:</strong> + </p> + + <p class="p"> + The following example shows how you might use the <code class="ph codeph">REFRESH</code> statement after manually adding + new HDFS data files to the Impala data directory for a table: + </p> + +<pre class="pre codeblock"><code>[impalad-host:21000] > refresh t1; +[impalad-host:21000] > refresh t2; +[impalad-host:21000] > select * from t1; +... +[impalad-host:21000] > select * from t2; +... </code></pre> + + <p class="p"> + For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a + combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>. + </p> + + <p class="p"> + <strong class="ph b">Related impala-shell options:</strong> + </p> + + <p class="p"> + The <span class="keyword cmdname">impala-shell</span> option <code class="ph codeph">-r</code> issues an <code class="ph codeph">INVALIDATE METADATA</code> statement + when starting up the shell, effectively performing a <code class="ph codeph">REFRESH</code> of all tables. + Due to the expense of reloading the metadata for all tables, the <span class="keyword cmdname">impala-shell</span> <code class="ph codeph">-r</code> + option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround + for synchronization issues in very old Impala versions.) + </p> + + <p class="p"> + <strong class="ph b">HDFS permissions:</strong> + </p> + <p class="p"> + The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, + typically the <code class="ph codeph">impala</code> user, must have execute + permissions for all the relevant directories holding table data. + (A table could have data spread across multiple directories, + or in unexpected paths, if it uses partitioning or + specifies a <code class="ph codeph">LOCATION</code> attribute for + individual partitions or the entire table.) + Issues with permissions might not cause an immediate error for this statement, + but subsequent statements such as <code class="ph codeph">SELECT</code> + or <code class="ph codeph">SHOW TABLE STATS</code> could fail. + </p> + <p class="p"> + All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table + or a single partition. + </p> + + <p class="p"> + <strong class="ph b">HDFS considerations:</strong> + </p> + + <p class="p"> + The <code class="ph codeph">REFRESH</code> command checks HDFS permissions of the underlying data files and directories, + caching this information so that a statement can be cancelled immediately if for example the + <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the table. Impala + reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case that + represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala + user, issue another <code class="ph codeph">REFRESH</code> to make Impala aware of the change. + </p> + + <div class="note important note_important"><span class="note__title importanttitle">Important:</span> + After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE + STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a + table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS + SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH + <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that + are very large, used in join queries, or both. + </div> + + <p class="p"> + <strong class="ph b">Amazon S3 considerations:</strong> + </p> + <p class="p"> + The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata + for tables where the data resides in the Amazon Simple Storage Service (S3). + In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files + in the associated S3 data directory. + See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables. + </p> + + <p class="p"> + <strong class="ph b">Cancellation:</strong> Cannot be cancelled. + </p> + + <p class="p"> + <strong class="ph b">Kudu considerations:</strong> + </p> + <p class="p"> + Much of the metadata for Kudu tables is handled by the underlying + storage layer. Kudu tables have less reliance on the metastore + database, and require less metadata caching on the Impala side. + For example, information about partitions in Kudu tables is managed + by Kudu, and Impala does not cache any block locality metadata + for Kudu tables. + </p> + <p class="p"> + The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> + statements are needed less frequently for Kudu tables than for + HDFS-backed tables. Neither statement is needed when data is + added to, removed, or updated in a Kudu table, even if the changes + are made directly to Kudu through a client program using the Kudu API. + Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or + <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> + for a Kudu table only after making a change to the Kudu table schema, + such as adding or dropping a column, by a mechanism other than + Impala. + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>, + <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_release_notes.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_release_notes.html b/docs/build/html/topics/impala_release_notes.html new file mode 100644 index 0000000..e36b70f --- /dev/null +++ b/docs/build/html/topics/impala_release_notes.html @@ -0,0 +1,26 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_relnotes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_new_features.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_incompatible_changes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_known_issues.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_fixed_issues.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_release_notes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="impala_release_notes">< main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1> + + + <div class="body conbody"> + + <p class="p"> + These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new + features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for + Impala versions up to <span class="ph">Impala 2.8.x</span>. For users + upgrading from earlier Impala releases, or using Impala in combination with specific versions of other + software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala (incubating)</a> lists any changes to + file formats, SQL syntax, or software dependencies to take into account. + </p> + + <p class="p"> + Once you are finished reviewing these release notes, for more information about using Impala, see + <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>. + </p> + + <p class="p toc"></p> + </div> +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_relnotes.html">Impala Release Notes</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_new_features.html">New Features in Apache Impala (incubating)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala (incubating)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_fixed_issues.html">Fixed Issues in Apache Impala (incubating)</a></strong><br></li></ul></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_relnotes.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_relnotes.html b/docs/build/html/topics/impala_relnotes.html new file mode 100644 index 0000000..09a20c9 --- /dev/null +++ b/docs/build/html/topics/impala_relnotes.html @@ -0,0 +1,26 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="relnotes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="relnotes"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1> + + + <div class="body conbody" id="relnotes__relnotes_intro"> + + <p class="p"> + These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new + features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for + Impala versions up to <span class="ph">Impala 2.8.x</span>. For users + upgrading from earlier Impala releases, or using Impala in combination with specific versions of other + software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala (incubating)</a> lists any changes to + file formats, SQL syntax, or software dependencies to take into account. + </p> + + <p class="p"> + Once you are finished reviewing these release notes, for more information about using Impala, see + <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>. + </p> + + <p class="p toc"></p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_replica_preference.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_replica_preference.html b/docs/build/html/topics/impala_replica_preference.html new file mode 100644 index 0000000..157d21c --- /dev/null +++ b/docs/build/html/topics/impala_replica_preference.html @@ -0,0 +1,45 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="replica_preference"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</title></head><body id="replica_preference"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">REPLICA_PREFERENCE Query Option (<span class="keyword">Impala 2.7</span> or higher only)</h1> + + + + <div class="body conbody"> + + <p class="p"> + + </p> + + <p class="p"> + The <code class="ph codeph">REPLICA_PREFERENCE</code> query option + lets you spread the load more evenly if hotspots and bottlenecks persist, by allowing hosts to do local reads, + or even remote reads, to retrieve the data for cached blocks if Impala can determine that it would be + too expensive to do all such processing on a particular host. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> numeric (0, 3, 5) + or corresponding mnemonic strings (<code class="ph codeph">CACHE_LOCAL</code>, <code class="ph codeph">DISK_LOCAL</code>, <code class="ph codeph">REMOTE</code>). + The gaps in the numeric sequence are to accomodate other intermediate + values that might be added in the future. + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> 0 (equivalent to <code class="ph codeph">CACHE_LOCAL</code>) + </p> + + <p class="p"> + <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.7.0</span> + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>, <a class="xref" href="impala_schedule_random_replica.html#schedule_random_replica">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a> + </p> + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_request_pool.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_request_pool.html b/docs/build/html/topics/impala_request_pool.html new file mode 100644 index 0000000..7127b0c --- /dev/null +++ b/docs/build/html/topics/impala_request_pool.html @@ -0,0 +1,35 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="request_pool"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REQUEST_POOL Query Option</title></head><body id="request_pool"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">REQUEST_POOL Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + + The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature. + Specifies the name of the pool used by requests from Impala to the resource manager. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> + </p> + + <p class="p"> + <strong class="ph b">Default:</strong> empty (use the user-to-pool mapping defined by an <span class="keyword cmdname">impalad</span> startup option + in the Impala configuration file) + </p> + + <p class="p"> + <strong class="ph b">Related information:</strong> + </p> + <p class="p"> + <a class="xref" href="impala_admission.html">Admission Control and Query Queuing</a> + </p> + + + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
