http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_admission.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_admission.html b/docs/build/html/topics/impala_admission.html new file mode 100644 index 0000000..294f8ca --- /dev/null +++ b/docs/build/html/topics/impala_admission.html @@ -0,0 +1,838 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admission_control"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Admission Control and Query Queuing</title></head><body id="admission_control"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Admission Control and Query Queuing</h1> + + + <div class="body conbody"> + + <p class="p" id="admission_control__admission_control_intro"> + Admission control is an Impala feature that imposes limits on concurrent SQL queries, to avoid resource usage + spikes and out-of-memory conditions on busy clusters. + It is a form of <span class="q">"throttling"</span>. + New queries are accepted and executed until + certain conditions are met, such as too many queries or too much + total memory used across the cluster. + When one of these thresholds is reached, + incoming queries wait to begin execution. These queries are + queued and are admitted (that is, begin executing) when the resources become available. + </p> + <p class="p"> + In addition to the threshold values for currently executing queries, + you can place limits on the maximum number of queries that are + queued (waiting) and a limit on the amount of time they might wait + before returning with an error. These queue settings let you ensure that queries do + not wait indefinitely, so that you can detect and correct <span class="q">"starvation"</span> scenarios. + </p> + <p class="p"> + Enable this feature if your cluster is + underutilized at some times and overutilized at others. Overutilization is indicated by performance + bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are + successful and perform well during times with less concurrent load. Admission control works as a safeguard to + avoid out-of-memory conditions during heavy concurrent usage. + </p> + + <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <p class="p"> + The use of the Llama component for integrated resource management within YARN + is no longer supported with <span class="keyword">Impala 2.3</span> and higher. + The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher. + </p> + <p class="p"> + For clusters running Impala alongside + other data management components, you define static service pools to define the resources + available to Impala and other components. Then within the area allocated for Impala, + you can create dynamic service pools, each with its own settings for the Impala admission control feature. + </p> + </div> + + <p class="p toc inpage"></p> + </div> + + <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="admission_control__admission_intro"> + + <h2 class="title topictitle2" id="ariaid-title2">Overview of Impala Admission Control</h2> + + + <div class="body conbody"> + + <p class="p"> + On a busy cluster, you might find there is an optimal number of Impala queries that run concurrently. + For example, when the I/O capacity is fully utilized by I/O-intensive queries, + you might not find any throughput benefit in running more concurrent queries. + By allowing some queries to run at full speed while others wait, rather than having + all queries contend for resources and run slowly, admission control can result in higher overall throughput. + </p> + + <p class="p"> + For another example, consider a memory-bound workload such as many large joins or aggregation queries. + Each such query could briefly use many gigabytes of memory to process intermediate results. + Because Impala by default cancels queries that exceed the specified memory limit, + running multiple large-scale queries at once might require + re-running some queries that are cancelled. In this case, admission control improves the + reliability and stability of the overall workload by only allowing as many concurrent queries + as the overall memory of the cluster can accomodate. + </p> + + <p class="p"> + The admission control feature lets you set an upper limit on the number of concurrent Impala + queries and on the memory used by those queries. Any additional queries are queued until the earlier ones + finish, rather than being cancelled or running slowly and causing contention. As other queries finish, the + queued queries are allowed to proceed. + </p> + + <p class="p"> + In <span class="keyword">Impala 2.5</span> and higher, you can specify these limits and thresholds for each + pool rather than globally. That way, you can balance the resource usage and throughput + between steady well-defined workloads, rare resource-intensive queries, and ad hoc + exploratory queries. + </p> + + <p class="p"> + For details on the internal workings of admission control, see + <a class="xref" href="impala_admission.html#admission_architecture">How Impala Schedules and Enforces Limits on Concurrent Queries</a>. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="admission_control__admission_concurrency"> + <h2 class="title topictitle2" id="ariaid-title3">Concurrent Queries and Admission Control</h2> + <div class="body conbody"> + <p class="p"> + One way to limit resource usage through admission control is to set an upper limit + on the number of concurrent queries. This is the initial technique you might use + when you do not have extensive information about memory usage for your workload. + This setting can be specified separately for each dynamic resource pool. + </p> + <p class="p"> + You can combine this setting with the memory-based approach described in + <a class="xref" href="impala_admission.html#admission_memory">Memory Limits and Admission Control</a>. If either the maximum number of + or the expected memory usage of the concurrent queries is exceeded, subsequent queries + are queued until the concurrent workload falls below the threshold again. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="admission_control__admission_memory"> + <h2 class="title topictitle2" id="ariaid-title4">Memory Limits and Admission Control</h2> + <div class="body conbody"> + <p class="p"> + Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool. + This is the technique to use once you have a stable workload with well-understood memory requirements. + </p> + <p class="p"> + Always specify the <span class="ph uicontrol">Default Query Memory Limit</span> for the expected maximum amount of RAM + that a query might require on each host, which is equivalent to setting the <code class="ph codeph">MEM_LIMIT</code> + query option for every query run in that pool. That value affects the execution of each query, preventing it + from overallocating memory on each host, and potentially activating the spill-to-disk mechanism or cancelling + the query when necessary. + </p> + <p class="p"> + Optionally, specify the <span class="ph uicontrol">Max Memory</span> setting, a cluster-wide limit that determines + how many queries can be safely run concurrently, based on the upper memory limit per host multiplied by the + number of Impala nodes in the cluster. + </p> + <div class="p"> + For example, consider the following scenario: + <ul class="ul"> + <li class="li"> The cluster is running <span class="keyword cmdname">impalad</span> daemons on five + DataNodes. </li> + <li class="li"> A dynamic resource pool has <span class="ph uicontrol">Max Memory</span> set + to 100 GB. </li> + <li class="li"> The <span class="ph uicontrol">Default Query Memory Limit</span> for the + pool is 10 GB. Therefore, any query running in this pool could use + up to 50 GB of memory (default query memory limit * number of Impala + nodes). </li> + <li class="li"> The maximum number of queries that Impala executes concurrently + within this dynamic resource pool is two, which is the most that + could be accomodated within the 100 GB <span class="ph uicontrol">Max + Memory</span> cluster-wide limit. </li> + <li class="li"> There is no memory penalty if queries use less memory than the + <span class="ph uicontrol">Default Query Memory Limit</span> per-host setting + or the <span class="ph uicontrol">Max Memory</span> cluster-wide limit. These + values are only used to estimate how many queries can be run + concurrently within the resource constraints for the pool. </li> + </ul> + </div> + <div class="note note note_note"><span class="note__title notetitle">Note:</span> If you specify <span class="ph uicontrol">Max + Memory</span> for an Impala dynamic resource pool, you must also + specify the <span class="ph uicontrol">Default Query Memory Limit</span>. + <span class="ph uicontrol">Max Memory</span> relies on the <span class="ph uicontrol">Default + Query Memory Limit</span> to produce a reliable estimate of + overall memory consumption for a query. </div> + <p class="p"> + You can combine the memory-based settings with the upper limit on concurrent queries described in + <a class="xref" href="impala_admission.html#admission_concurrency">Concurrent Queries and Admission Control</a>. If either the maximum number of + or the expected memory usage of the concurrent queries is exceeded, subsequent queries + are queued until the concurrent workload falls below the threshold again. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="admission_control__admission_yarn"> + + <h2 class="title topictitle2" id="ariaid-title5">How Impala Admission Control Relates to Other Resource Management Tools</h2> + + + <div class="body conbody"> + + <p class="p"> + The admission control feature is similar in some ways to the YARN resource management framework. These features + can be used separately or together. This section describes some similarities and differences, to help you + decide which combination of resource management features to use for Impala. + </p> + + <p class="p"> + Admission control is a lightweight, decentralized system that is suitable for workloads consisting + primarily of Impala queries and other SQL statements. It sets <span class="q">"soft"</span> limits that smooth out Impala + memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs + that are too resource-intensive. + </p> + + <p class="p"> + Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you + might use YARN with static service pools on clusters where resources are shared between + Impala and other Hadoop components. This configuration is recommended when using Impala in a + <dfn class="term">multitenant</dfn> cluster. Devote a percentage of cluster resources to Impala, and allocate another + percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and + memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the + cluster. In this scenario, Impala's resources are not managed by YARN. + </p> + + <p class="p"> + The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to + pools and authenticate them. + </p> + + <p class="p"> + Although the Impala admission control feature uses a <code class="ph codeph">fair-scheduler.xml</code> configuration file + behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file + even when YARN is using the capacity scheduler. + </p> + + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="admission_control__admission_architecture"> + + <h2 class="title topictitle2" id="ariaid-title6">How Impala Schedules and Enforces Limits on Concurrent Queries</h2> + + + <div class="body conbody"> + + <p class="p"> + The admission control system is decentralized, embedded in each Impala daemon and communicating through the + statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply + cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run + immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control + mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the + more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries + exceeds the expected number. Thus, you typically err on the + high side for the size of the queue, because there is not a big penalty for having a large number of queued + queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more + queries are admitted than expected, without running out of memory and being cancelled as a result. + </p> + + + + <p class="p"> + To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for + queries that are queued. When the number of queued queries exceeds this limit, further queries are + cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are + cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to + too many concurrent requests or long waits for query execution to begin, that is a signal for an + administrator to take action, either by provisioning more resources, scheduling work on the cluster to + smooth out the load, or by doing <a class="xref" href="impala_performance.html#performance">Impala performance + tuning</a> to enable higher throughput. + </p> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="admission_control__admission_jdbc_odbc"> + + <h2 class="title topictitle2" id="ariaid-title7">How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</h2> + + + <div class="body conbody"> + + <p class="p"> + Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC: + </p> + + <ul class="ul"> + <li class="li"> + If a SQL statement is put into a queue rather than running immediately, the API call blocks until the + statement is dequeued and begins execution. At that point, the client program can request to fetch + results, which might also block until results become available. + </li> + + <li class="li"> + If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory + limit during execution, the error is returned to the client program with a descriptive error message. + </li> + + </ul> + + <p class="p"> + In Impala 2.0 and higher, you can submit + a SQL <code class="ph codeph">SET</code> statement from the client application + to change the <code class="ph codeph">REQUEST_POOL</code> query option. + This option lets you submit queries to different resource pools, + as described in <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a>. + + </p> + + <p class="p"> + At any time, the set of queued queries could include queries submitted through multiple different Impala + daemon hosts. All the queries submitted through a particular host will be executed in order, so a + <code class="ph codeph">CREATE TABLE</code> followed by an <code class="ph codeph">INSERT</code> on the same table would succeed. + Queries submitted through different hosts are not guaranteed to be executed in the order they were + received. Therefore, if you are using load-balancing or other round-robin scheduling where different + statements are submitted through different hosts, set up all table structures ahead of time so that the + statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a + sequence of statements needs to happen in strict order (such as an <code class="ph codeph">INSERT</code> followed by a + <code class="ph codeph">SELECT</code>), submit all those statements through a single session, while connected to the same + Impala daemon host. + </p> + + <p class="p"> + Admission control has the following limitations or special behavior when used with JDBC or ODBC + applications: + </p> + + <ul class="ul"> + <li class="li"> + The other resource-related query options, + <code class="ph codeph">RESERVATION_REQUEST_TIMEOUT</code> and <code class="ph codeph">V_CPU_CORES</code>, are no longer used. Those query options only + applied to using Impala with Llama, which is no longer supported. + </li> + </ul> + </div> + </article> + + <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="admission_control__admission_schema_config"> + <h2 class="title topictitle2" id="ariaid-title8">SQL and Schema Considerations for Admission Control</h2> + <div class="body conbody"> + <p class="p"> + When queries complete quickly and are tuned for optimal memory usage, there is less chance of + performance or capacity problems during times of heavy load. Before setting up admission control, + tune your Impala queries to ensure that the query plans are efficient and the memory estimates + are accurate. Understanding the nature of your workload, and which queries are the most + resource-intensive, helps you to plan how to divide the queries into different pools and + decide what limits to define for each pool. + </p> + <p class="p"> + For large tables, especially those involved in join queries, keep their statistics up to date + after loading substantial amounts of new data or adding new partitions. + Use the <code class="ph codeph">COMPUTE STATS</code> statement for unpartitioned tables, and + <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for partitioned tables. + </p> + <p class="p"> + When you use dynamic resource pools with a <span class="ph uicontrol">Max Memory</span> setting enabled, + you typically override the memory estimates that Impala makes based on the statistics from the + <code class="ph codeph">COMPUTE STATS</code> statement. + You either set the <code class="ph codeph">MEM_LIMIT</code> query option within a particular session to + set an upper memory limit for queries within that session, or a default <code class="ph codeph">MEM_LIMIT</code> + setting for all queries processed by the <span class="keyword cmdname">impalad</span> instance, or + a default <code class="ph codeph">MEM_LIMIT</code> setting for all queries assigned to a particular + dynamic resource pool. By designating a consistent memory limit for a set of similar queries + that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions + that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate. + </p> + <p class="p"> + Follow other steps from <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> to tune your queries. + </p> + </div> + </article> + + + <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="admission_control__admission_config"> + + <h2 class="title topictitle2" id="ariaid-title9">Configuring Admission Control</h2> + + + <div class="body conbody"> + + <p class="p"> + The configuration options for admission control range from the simple (a single resource pool with a single + set of options) to the complex (multiple resource pools with different options, each pool handling queries + for a different set of users and groups). + </p> + + <section class="section" id="admission_config__admission_flags"><h3 class="title sectiontitle">Impala Service Flags for Admission Control (Advanced)</h3> + + + + <p class="p"> + The following Impala configuration options let you adjust the settings of the admission control feature. When supplying the + options on the <span class="keyword cmdname">impalad</span> command line, prepend the option name with <code class="ph codeph">--</code>. + </p> + + <dl class="dl" id="admission_config__admission_control_option_list"> + + <dt class="dt dlterm" id="admission_config__queue_wait_timeout_ms"> + <code class="ph codeph">queue_wait_timeout_ms</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Maximum amount of time (in milliseconds) that a + request waits to be admitted before timing out. + <p class="p"> + <strong class="ph b">Type:</strong> <code class="ph codeph">int64</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">60000</code> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__default_pool_max_requests"> + <code class="ph codeph">default_pool_max_requests</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Maximum number of concurrent outstanding requests + allowed to run before incoming requests are queued. Because this + limit applies cluster-wide, but each Impala node makes independent + decisions to run queries immediately or queue them, it is a soft + limit; the overall number of concurrent queries might be slightly + higher during times of heavy load. A negative value indicates no + limit. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> and + <code class="ph codeph">llama_site_path</code> are set. <p class="p"> + <strong class="ph b">Type:</strong> + <code class="ph codeph">int64</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <span class="ph">-1, meaning unlimited (prior to <span class="keyword">Impala 2.5</span> the default was 200)</span> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__default_pool_max_queued"> + <code class="ph codeph">default_pool_max_queued</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Maximum number of requests allowed to be queued + before rejecting requests. Because this limit applies + cluster-wide, but each Impala node makes independent decisions to + run queries immediately or queue them, it is a soft limit; the + overall number of queued queries might be slightly higher during + times of heavy load. A negative value or 0 indicates requests are + always rejected once the maximum concurrent requests are + executing. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> + and <code class="ph codeph">llama_site_path</code> are set. <p class="p"> + <strong class="ph b">Type:</strong> + <code class="ph codeph">int64</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <span class="ph">unlimited</span> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__default_pool_mem_limit"> + <code class="ph codeph">default_pool_mem_limit</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Maximum amount of memory (across the entire + cluster) that all outstanding requests in this pool can use before + new requests to this pool are queued. Specified in bytes, + megabytes, or gigabytes by a number followed by the suffix + <code class="ph codeph">b</code> (optional), <code class="ph codeph">m</code>, or + <code class="ph codeph">g</code>, either uppercase or lowercase. You can + specify floating-point values for megabytes and gigabytes, to + represent fractional numbers such as <code class="ph codeph">1.5</code>. You can + also specify it as a percentage of the physical memory by + specifying the suffix <code class="ph codeph">%</code>. 0 or no setting + indicates no limit. Defaults to bytes if no unit is given. Because + this limit applies cluster-wide, but each Impala node makes + independent decisions to run queries immediately or queue them, it + is a soft limit; the overall memory used by concurrent queries + might be slightly higher during times of heavy load. Ignored if + <code class="ph codeph">fair_scheduler_config_path</code> and + <code class="ph codeph">llama_site_path</code> are set. <div class="note note note_note"><span class="note__title notetitle">Note:</span> + Impala relies on the statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement to estimate memory + usage for each query. See <a class="xref" href="../shared/../topics/impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for guidelines + about how and when to use this statement. + </div> + <p class="p"> + <strong class="ph b">Type:</strong> string + </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <code class="ph codeph">""</code> (empty string, meaning unlimited) </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__disable_admission_control"> + <code class="ph codeph">disable_admission_control</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Turns off the admission control feature entirely, + regardless of other configuration option settings. + <p class="p"> + <strong class="ph b">Type:</strong> Boolean </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <code class="ph codeph">false</code> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__disable_pool_max_requests"> + <code class="ph codeph">disable_pool_max_requests</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Disables all per-pool limits on the maximum number + of running requests. <p class="p"> + <strong class="ph b">Type:</strong> Boolean </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <code class="ph codeph">false</code> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__disable_pool_mem_limits"> + <code class="ph codeph">disable_pool_mem_limits</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Disables all per-pool mem limits. <p class="p"> + <strong class="ph b">Type:</strong> Boolean </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <code class="ph codeph">false</code> + </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__fair_scheduler_allocation_path"> + <code class="ph codeph">fair_scheduler_allocation_path</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Path to the fair scheduler allocation file + (<code class="ph codeph">fair-scheduler.xml</code>). <p class="p"> + <strong class="ph b">Type:</strong> string + </p> + <p class="p"> + <strong class="ph b">Default:</strong> + <code class="ph codeph">""</code> (empty string) </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> Admission control only uses a small subset + of the settings that can go in this file, as described below. + For details about all the Fair Scheduler configuration settings, + see the <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache wiki</a>. </p> + </dd> + + + <dt class="dt dlterm" id="admission_config__llama_site_path"> + <code class="ph codeph">llama_site_path</code> + </dt> + <dd class="dd"> + + <strong class="ph b">Purpose:</strong> Path to the configuration file used by admission control + (<code class="ph codeph">llama-site.xml</code>). If set, + <code class="ph codeph">fair_scheduler_allocation_path</code> must also be set. + <p class="p"> + <strong class="ph b">Type:</strong> string + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">""</code> (empty string) </p> + <p class="p"> + <strong class="ph b">Usage notes:</strong> Admission control only uses a few + of the settings that can go in this file, as described below. + </p> + </dd> + + </dl> + </section> + </div> + + <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="admission_config__admission_config_manual"> + + <h3 class="title topictitle3" id="ariaid-title10">Configuring Admission Control Using the Command Line</h3> + + <div class="body conbody"> + + <p class="p"> + To configure admission control, use a combination of startup options for the Impala daemon and edit + or create the configuration files <span class="ph filepath">fair-scheduler.xml</span> and + <span class="ph filepath">llama-site.xml</span>. + </p> + + <p class="p"> + For a straightforward configuration using a single resource pool named <code class="ph codeph">default</code>, you can + specify configuration options on the command line and skip the <span class="ph filepath">fair-scheduler.xml</span> + and <span class="ph filepath">llama-site.xml</span> configuration files. + </p> + + <p class="p"> + For an advanced configuration with multiple resource pools using different settings, set up the + <span class="ph filepath">fair-scheduler.xml</span> and <span class="ph filepath">llama-site.xml</span> configuration files + manually. Provide the paths to each one using the <span class="keyword cmdname">impalad</span> command-line options, + <code class="ph codeph">--fair_scheduler_allocation_path</code> and <code class="ph codeph">--llama_site_path</code> respectively. + </p> + + <p class="p"> + The Impala admission control feature only uses the Fair Scheduler configuration settings to determine how + to map users and groups to different resource pools. For example, you might set up different resource + pools with separate memory limits, and maximum number of concurrent and queued queries, for different + categories of users within your organization. For details about all the Fair Scheduler configuration + settings, see the + <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache + wiki</a>. + </p> + + <p class="p"> + The Impala admission control feature only uses a small subset of possible settings from the + <span class="ph filepath">llama-site.xml</span> configuration file: + </p> + +<pre class="pre codeblock"><code>llama.am.throttling.maximum.placed.reservations.<var class="keyword varname">queue_name</var> +llama.am.throttling.maximum.queued.reservations.<var class="keyword varname">queue_name</var> +<span class="ph">impala.admission-control.pool-default-query-options.<var class="keyword varname">queue_name</var> +impala.admission-control.pool-queue-timeout-ms.<var class="keyword varname">queue_name</var></span> +</code></pre> + + <p class="p"> + The <code class="ph codeph">impala.admission-control.pool-queue-timeout-ms</code> + setting specifies the timeout value for this pool, in milliseconds. + The<code class="ph codeph">impala.admission-control.pool-default-query-options</code> + settings designates the default query options for all queries that run + in this pool. Its argument value is a comma-delimited string of + 'key=value' pairs, for example,<code class="ph codeph">'key1=val1,key2=val2'</code>. + For example, this is where you might set a default memory limit + for all queries in the pool, using an argument such as <code class="ph codeph">MEM_LIMIT=5G</code>. + </p> + + <p class="p"> + The <code class="ph codeph">impala.admission-control.*</code> configuration settings are available in + <span class="keyword">Impala 2.5</span> and higher. + </p> + + </div> + </article> + + <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="admission_config__admission_examples"> + + <h3 class="title topictitle3" id="ariaid-title11">Example of Admission Control Configuration</h3> + + <div class="body conbody"> + + <p class="p"> Here are sample <span class="ph filepath">fair-scheduler.xml</span> and + <span class="ph filepath">llama-site.xml</span> files that define resource pools + <code class="ph codeph">root.default</code>, <code class="ph codeph">root.development</code>, and + <code class="ph codeph">root.production</code>. These sample files are stripped down: in a real + deployment they might contain other settings for use with various aspects of the YARN + component. The settings shown here are the significant ones for the Impala admission + control feature. </p> + + <p class="p"> + <strong class="ph b">fair-scheduler.xml:</strong> + </p> + + <p class="p"> + Although Impala does not use the <code class="ph codeph">vcores</code> value, you must still specify it to satisfy + YARN requirements for the file contents. + </p> + + <p class="p"> + Each <code class="ph codeph"><aclSubmitApps></code> tag (other than the one for <code class="ph codeph">root</code>) contains + a comma-separated list of users, then a space, then a comma-separated list of groups; these are the + users and groups allowed to submit Impala statements to the corresponding resource pool. + </p> + + <p class="p"> + If you leave the <code class="ph codeph"><aclSubmitApps></code> element empty for a pool, nobody can submit + directly to that pool; child pools can specify their own <code class="ph codeph"><aclSubmitApps></code> values + to authorize users and groups to submit to those pools. + </p> + + <pre class="pre codeblock"><code><allocations> + + <queue name="root"> + <aclSubmitApps> </aclSubmitApps> + <queue name="default"> + <maxResources>50000 mb, 0 vcores</maxResources> + <aclSubmitApps>*</aclSubmitApps> + </queue> + <queue name="development"> + <maxResources>200000 mb, 0 vcores</maxResources> + <aclSubmitApps>user1,user2 dev,ops,admin</aclSubmitApps> + </queue> + <queue name="production"> + <maxResources>1000000 mb, 0 vcores</maxResources> + <aclSubmitApps> ops,admin</aclSubmitApps> + </queue> + </queue> + <queuePlacementPolicy> + <rule name="specified" create="false"/> + <rule name="default" /> + </queuePlacementPolicy> +</allocations> + +</code></pre> + + <p class="p"> + <strong class="ph b">llama-site.xml:</strong> + </p> + + <pre class="pre codeblock"><code> +<?xml version="1.0" encoding="UTF-8"?> +<configuration> + <property> + <name>llama.am.throttling.maximum.placed.reservations.root.default</name> + <value>10</value> + </property> + <property> + <name>llama.am.throttling.maximum.queued.reservations.root.default</name> + <value>50</value> + </property> + <property> + <name>impala.admission-control.pool-default-query-options.root.default</name> + <value>mem_limit=128m,query_timeout_s=20,max_io_buffers=10</value> + </property> + <property> + <name>impala.admission-control.pool-queue-timeout-ms.root.default</name> + <value>30000</value> + </property> + <property> + <name>llama.am.throttling.maximum.placed.reservations.root.development</name> + <value>50</value> + </property> + <property> + <name>llama.am.throttling.maximum.queued.reservations.root.development</name> + <value>100</value> + </property> + <property> + <name>impala.admission-control.pool-default-query-options.root.development</name> + <value>mem_limit=256m,query_timeout_s=30,max_io_buffers=10</value> + </property> + <property> + <name>impala.admission-control.pool-queue-timeout-ms.root.development</name> + <value>15000</value> + </property> + <property> + <name>llama.am.throttling.maximum.placed.reservations.root.production</name> + <value>100</value> + </property> + <property> + <name>llama.am.throttling.maximum.queued.reservations.root.production</name> + <value>200</value> + </property> +<!-- + Default query options for the 'root.production' pool. + THIS IS A NEW PARAMETER in Impala 2.5. + Note that the MEM_LIMIT query option still shows up in here even though it is a + separate box in the UI. We do that because it is the most important query option + that people will need (everything else is somewhat advanced). + + MEM_LIMIT takes a per-node memory limit which is specified using one of the following: + - '<int>[bB]?' -> bytes (default if no unit given) + - '<float>[mM(bB)]' -> megabytes + - '<float>[gG(bB)]' -> in gigabytes + E.g. 'MEM_LIMIT=12345' (no unit) means 12345 bytes, and you can append m or g + to specify megabytes or gigabytes, though that is not required. +--> + <property> + <name>impala.admission-control.pool-default-query-options.root.production</name> + <value>mem_limit=386m,query_timeout_s=30,max_io_buffers=10</value> + </property> +<!-- + Default queue timeout (ms) for the pool 'root.production'. + If this isnât set, the process-wide flag is used. + THIS IS A NEW PARAMETER in Impala 2.5. +--> + <property> + <name>impala.admission-control.pool-queue-timeout-ms.root.production</name> + <value>30000</value> + </property> +</configuration> + +</code></pre> + + </div> + </article> + + + + <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="admission_config__admission_guidelines"> + + <h3 class="title topictitle3" id="ariaid-title12">Guidelines for Using Admission Control</h3> + + + <div class="body conbody"> + + <p class="p"> + To see how admission control works for particular queries, examine the profile output for the query. This + information is available through the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> + immediately after running a query in the shell, on the <span class="ph uicontrol">queries</span> page of the Impala + debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log + level 2). The profile output contains details about the admission decision, such as whether the query was + queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory + usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools. + </p> + + <p class="p"> + Remember that the limits imposed by admission control are <span class="q">"soft"</span> limits. + The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether + to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth + between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run + concurrently, then throughput could decrease due to queries spilling to disk or contending for resources; + or queries could be cancelled if they exceed the <code class="ph codeph">MEM_LIMIT</code> setting while running. + </p> + + + + <p class="p"> + In <span class="keyword cmdname">impala-shell</span>, you can also specify which resource pool to direct queries to by + setting the <code class="ph codeph">REQUEST_POOL</code> query option. + </p> + + <p class="p"> + The statements affected by the admission control feature are primarily queries, but also include statements + that write data such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>. Most write + operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial + memory due to buffering intermediate data before writing out each Parquet data block. See + <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a> for instructions about inserting data efficiently into + Parquet tables. + </p> + + <p class="p"> + Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query + is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session + are also queued so that they are processed in the correct order: + </p> + +<pre class="pre codeblock"><code>-- This query could be queued to avoid out-of-memory at times of heavy load. +select * from huge_table join enormous_table using (id); +-- If so, this subsequent statement in the same session is also queued +-- until the previous statement completes. +drop table huge_table; +</code></pre> + + <p class="p"> + If you set up different resource pools for different users and groups, consider reusing any classifications + you developed for use with Sentry security. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details. + </p> + + <p class="p"> + For details about all the Fair Scheduler configuration settings, see + <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Fair Scheduler Configuration</a>, in particular the tags such as <code class="ph codeph"><queue></code> and + <code class="ph codeph"><aclSubmitApps></code> to map users and groups to particular resource pools (queues). + </p> + + + </div> + </article> +</article> +</article></main></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_aggregate_functions.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_aggregate_functions.html b/docs/build/html/topics/impala_aggregate_functions.html new file mode 100644 index 0000000..0b6ab31 --- /dev/null +++ b/docs/build/html/topics/impala_aggregate_functions.html @@ -0,0 +1,34 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_median.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avg.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_count.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_concat.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ndv.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_stddev.html"><meta name="DC.Relation" scheme="URI" conte nt="../topics/impala_sum.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_variance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aggregate_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Aggregate Functions</title></head><body id="aggregate_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Impala Aggregate Functions</h1> + + + + <div class="body conbody"> + + <p class="p"> + Aggregate functions are a special category with different rules. These functions calculate a return value + across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query: + </p> + +<pre class="pre codeblock"><code>select count(product_id) from product_catalog; +select max(height), avg(height) from census_data where age > 20; +</code></pre> + + <p class="p"> + Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code> + result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are + ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying + <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where + <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value. + </p> + + <p class="p"> + + </p> + + <p class="p toc"></p> + </div> +<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_appx_median.html">APPX_MEDIAN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avg.html">AVG Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_count.html">COUNT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_concat.html">GROUP_CONCAT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max.html">MAX Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min.html">MIN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ndv.html">NDV Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</a></strong><br></li><li cl ass="link ulchildlink"><strong><a href="../topics/impala_sum.html">SUM Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_aliases.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_aliases.html b/docs/build/html/topics/impala_aliases.html new file mode 100644 index 0000000..4322db3 --- /dev/null +++ b/docs/build/html/topics/impala_aliases.html @@ -0,0 +1,85 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aliases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Aliases</title></head><body id="aliases"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Aliases</h1> + + + + <div class="body conbody"> + + <p class="p"> + When you write the names of tables, columns, or column expressions in a query, you can assign an alias at the + same time. Then you can specify the alias rather than the original name when making other references to the + table or column in the same statement. You typically specify aliases that are shorter, easier to remember, or + both than the original names. The aliases are printed in the query header, making them useful for + self-documenting output. + </p> + + <p class="p"> + To set up an alias, add the <code class="ph codeph">AS <var class="keyword varname">alias</var></code> clause immediately after any table, + column, or expression name in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">FROM</code> list of a query. The + <code class="ph codeph">AS</code> keyword is optional; you can also specify the alias immediately after the original name. + </p> + +<pre class="pre codeblock"><code>-- Make the column headers of the result set easier to understand. +SELECT c1 AS name, c2 AS address, c3 AS phone FROM table_with_terse_columns; +SELECT SUM(ss_xyz_dollars_net) AS total_sales FROM table_with_cryptic_columns; +-- The alias can be a quoted string for extra readability. +SELECT c1 AS "Employee ID", c2 AS "Date of hire" FROM t1; +-- The AS keyword is optional. +SELECT c1 "Employee ID", c2 "Date of hire" FROM t1; + +-- The table aliases assigned in the FROM clause can be used both earlier +-- in the query (the SELECT list) and later (the WHERE clause). +SELECT one.name, two.address, three.phone + FROM census one, building_directory two, phonebook three +WHERE one.id = two.id and two.id = three.id; + +-- The aliases c1 and c2 let the query handle columns with the same names from 2 joined tables. +-- The aliases t1 and t2 let the query abbreviate references to long or cryptically named tables. +SELECT t1.column_n AS c1, t2.column_n AS c2 FROM long_name_table AS t1, very_long_name_table2 AS t2 + WHERE c1 = c2; +SELECT t1.column_n c1, t2.column_n c2 FROM table1 t1, table2 t2 + WHERE c1 = c2; +</code></pre> + + <p class="p"> + To use an alias name that matches one of the Impala reserved keywords (listed in + <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with either single or + double quotation marks, or <code class="ph codeph">``</code> characters (backticks). + </p> + + <p class="p"> + <span class="ph"> Aliases follow the same rules as identifiers when it comes to case + insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can + include additional characters such as spaces and dashes when they are quoted using backtick characters. + </span> + </p> + + <p class="p"> + <strong class="ph b">Complex type considerations:</strong> + </p> + + <p class="p"> + Queries involving the complex types (<code class="ph codeph">ARRAY</code>, + <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), typically make + extensive use of table aliases. These queries involve join clauses + where the complex type column is treated as a joined table. + To construct two-part or three-part qualified names for the + complex column elements in the <code class="ph codeph">FROM</code> list, + sometimes it is syntactically required to construct a table + alias for the complex column where it is referenced in the join clause. + See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples. + </p> + + <p class="p"> + <strong class="ph b">Alternatives:</strong> + </p> + + <p class="p"> + Another way to define different names for the same tables or columns is to create views. See + <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details. + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_allow_unsupported_formats.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_allow_unsupported_formats.html b/docs/build/html/topics/impala_allow_unsupported_formats.html new file mode 100644 index 0000000..824c555 --- /dev/null +++ b/docs/build/html/topics/impala_allow_unsupported_formats.html @@ -0,0 +1,24 @@ +<!DOCTYPE html + SYSTEM "about:legacy-compat"> +<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="allow_unsupported_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALLOW_UNSUPPORTED_FORMATS Query Option</title></head><body id="allow_unsupported_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + + <h1 class="title topictitle1" id="ariaid-title1">ALLOW_UNSUPPORTED_FORMATS Query Option</h1> + + + + <div class="body conbody"> + + <p class="p"> + An obsolete query option from early work on support for file formats. Do not use. Might be removed in the + future. + </p> + + <p class="p"> + <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>; + any other value interpreted as <code class="ph codeph">false</code> + </p> + <p class="p"> + <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement) + </p> + </div> +<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file
