This is an automated email from the ASF dual-hosted git repository. arodoni pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit 022ba2bbdb6feec0ba944847e92d5a051cedc9a1 Author: Alex Rodoni <arod...@cloudera.com> AuthorDate: Mon Feb 4 17:41:23 2019 -0800 IMPALA-8105: [DOCS] Document cache_remote_file_handles flag Change-Id: Id649e733324f55a80a0199302dfa3b627ad183cf Reviewed-on: http://gerrit.cloudera.org:8080/12362 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> --- docs/topics/impala_scalability.xml | 880 +++++++++++++++++++++---------------- 1 file changed, 498 insertions(+), 382 deletions(-) diff --git a/docs/topics/impala_scalability.xml b/docs/topics/impala_scalability.xml index 94c2fdb..71b8425 100644 --- a/docs/topics/impala_scalability.xml +++ b/docs/topics/impala_scalability.xml @@ -1,4 +1,5 @@ -<?xml version="1.0" encoding="UTF-8"?><!-- +<?xml version="1.0" encoding="UTF-8"?> +<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information @@ -20,7 +21,13 @@ under the License. <concept id="scalability"> <title>Scalability Considerations for Impala</title> - <titlealts audience="PDF"><navtitle>Scalability Considerations</navtitle></titlealts> + + <titlealts audience="PDF"> + + <navtitle>Scalability Considerations</navtitle> + + </titlealts> + <prolog> <metadata> <data name="Category" value="Performance"/> @@ -30,7 +37,7 @@ under the License. <data name="Category" value="Developers"/> <data name="Category" value="Memory"/> <data name="Category" value="Scalability"/> - <!-- Using domain knowledge about Impala, sizing, etc. to decide what to mark as 'Proof of Concept'. --> +<!-- Using domain knowledge about Impala, sizing, etc. to decide what to mark as 'Proof of Concept'. --> <data name="Category" value="Proof of Concept"/> </metadata> </prolog> @@ -38,10 +45,11 @@ under the License. <conbody> <p> - This section explains how the size of your cluster and the volume of data influences SQL performance and - schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory - limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of - scalability issues, such as a single slow node that causes performance problems for queries. + This section explains how the size of your cluster and the volume of data influences SQL + performance and schema design for Impala tables. Typically, adding more cluster capacity + reduces problems due to memory limits or disk throughput. On the other hand, larger + clusters are more likely to have other kinds of scalability issues, such as a single slow + node that causes performance problems for queries. </p> <p outputclass="toc inpage"/> @@ -53,14 +61,15 @@ under the License. <concept audience="hidden" id="scalability_memory"> <title>Overview and Guidelines for Impala Memory Usage</title> - <prolog> - <metadata> - <data name="Category" value="Memory"/> - <data name="Category" value="Concepts"/> - <data name="Category" value="Best Practices"/> - <data name="Category" value="Guidelines"/> - </metadata> - </prolog> + + <prolog> + <metadata> + <data name="Category" value="Memory"/> + <data name="Category" value="Concepts"/> + <data name="Category" value="Best Practices"/> + <data name="Category" value="Guidelines"/> + </metadata> + </prolog> <conbody> @@ -158,7 +167,9 @@ Memory Usage: Additional Notes * Review previous common issues on out-of-memory * Note: Even with disk-based joins, you'll want to review these steps to speed up queries and use memory more efficiently </codeblock> + </conbody> + </concept> <concept id="scalability_catalog"> @@ -173,24 +184,25 @@ Memory Usage: Additional Notes </p> <p> - Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables - containing relatively few, large data files. Schemas containing thousands of tables, or tables containing - thousands of partitions, can encounter performance issues during startup or during DDL operations such as - <codeph>ALTER TABLE</codeph> statements. + Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized + for tables containing relatively few, large data files. Schemas containing thousands of + tables, or tables containing thousands of partitions, can encounter performance issues + during startup or during DDL operations such as <codeph>ALTER TABLE</codeph> statements. </p> <note type="important" rev="TSB-168"> - <p> - Because of a change in the default heap size for the <cmdname>catalogd</cmdname> daemon in - <keyword keyref="impala25_full"/> and higher, the following procedure to increase the <cmdname>catalogd</cmdname> - memory limit might be required following an upgrade to <keyword keyref="impala25_full"/> even if not - needed previously. - </p> + <p> + Because of a change in the default heap size for the <cmdname>catalogd</cmdname> + daemon in <keyword keyref="impala25_full"/> and higher, the following procedure to + increase the <cmdname>catalogd</cmdname> memory limit might be required following an + upgrade to <keyword keyref="impala25_full"/> even if not needed previously. + </p> </note> <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/> </conbody> + </concept> <concept rev="2.1.0" id="statestore_scalability"> @@ -200,30 +212,33 @@ Memory Usage: Additional Notes <conbody> <p> - Before <keyword keyref="impala21_full"/>, the statestore sent only one kind of message to its subscribers. This message contained all - updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the - statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a + Before <keyword keyref="impala21_full"/>, the statestore sent only one kind of message + to its subscribers. This message contained all updates for any topics that a subscriber + had subscribed to. It also served to let subscribers know that the statestore had not + failed, and conversely the statestore used the success of sending a heartbeat to a subscriber to decide whether or not the subscriber had failed. </p> <p> - Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large - numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata - updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time - out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of - statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or - restarted. + Combining topic updates and failure detection in a single message led to bottlenecks in + clusters with large numbers of tables, partitions, and HDFS data blocks. When the + statestore was overloaded with metadata updates to transmit, heartbeat messages were + sent less frequently, sometimes causing subscribers to time out their connection with + the statestore. Increasing the subscriber timeout and decreasing the frequency of + statestore heartbeats worked around the problem, but reduced responsiveness when the + statestore failed or restarted. </p> <p> - As of <keyword keyref="impala21_full"/>, the statestore now sends topic updates and heartbeats in separate messages. This allows the - statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to - send topic updates according to a fixed schedule, reducing statestore network overhead. + As of <keyword keyref="impala21_full"/>, the statestore now sends topic updates and + heartbeats in separate messages. This allows the statestore to send and receive a steady + stream of lightweight heartbeats, and removes the requirement to send topic updates + according to a fixed schedule, reducing statestore network overhead. </p> <p> - The statestore now has the following relevant configuration flags for the <cmdname>statestored</cmdname> - daemon: + The statestore now has the following relevant configuration flags for the + <cmdname>statestored</cmdname> daemon: </p> <dl> @@ -234,8 +249,8 @@ Memory Usage: Additional Notes </dt> <dd> - The number of threads inside the statestore dedicated to sending topic updates. You should not - typically need to change this value. + The number of threads inside the statestore dedicated to sending topic updates. You + should not typically need to change this value. <p> <b>Default:</b> 10 </p> @@ -250,9 +265,10 @@ Memory Usage: Additional Notes </dt> <dd> - The frequency, in milliseconds, with which the statestore tries to send topic updates to each - subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends - topic updates as fast as it can. You should not typically need to change this value. + The frequency, in milliseconds, with which the statestore tries to send topic + updates to each subscriber. This is a best-effort value; if the statestore is unable + to meet this frequency, it sends topic updates as fast as it can. You should not + typically need to change this value. <p> <b>Default:</b> 2000 </p> @@ -267,8 +283,8 @@ Memory Usage: Additional Notes </dt> <dd> - The number of threads inside the statestore dedicated to sending heartbeats. You should not typically - need to change this value. + The number of threads inside the statestore dedicated to sending heartbeats. You + should not typically need to change this value. <p> <b>Default:</b> 10 </p> @@ -283,9 +299,10 @@ Memory Usage: Additional Notes </dt> <dd> - The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber. - This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that, - you might need to increase this value to make the interval longer between heartbeat messages. + The frequency, in milliseconds, with which the statestore tries to send heartbeats + to each subscriber. This value should be good for large catalogs and clusters up to + approximately 150 nodes. Beyond that, you might need to increase this value to make + the interval longer between heartbeat messages. <p> <b>Default:</b> 1000 (one heartbeat message every second) </p> @@ -295,46 +312,54 @@ Memory Usage: Additional Notes </dl> <p> - If it takes a very long time for a cluster to start up, and <cmdname>impala-shell</cmdname> consistently - displays <codeph>This Impala daemon is not ready to accept user requests</codeph>, the statestore might be - taking too long to send the entire catalog topic to the cluster. In this case, consider adding - <codeph>--load_catalog_in_background=false</codeph> to your catalog service configuration. This setting - stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for - each table is loaded when the table is accessed for the first time. + If it takes a very long time for a cluster to start up, and + <cmdname>impala-shell</cmdname> consistently displays <codeph>This Impala daemon is not + ready to accept user requests</codeph>, the statestore might be taking too long to send + the entire catalog topic to the cluster. In this case, consider adding + <codeph>--load_catalog_in_background=false</codeph> to your catalog service + configuration. This setting stops the statestore from loading the entire catalog into + memory at cluster startup. Instead, metadata for each table is loaded when the table is + accessed for the first time. </p> + </conbody> + </concept> <concept id="scalability_buffer_pool" rev="2.10.0 IMPALA-3200"> + <title>Effect of Buffer Pool on Memory Usage (<keyword keyref="impala210"/> and higher)</title> + <conbody> + <p> - The buffer pool feature, available in <keyword keyref="impala210"/> and higher, changes the - way Impala allocates memory during a query. Most of the memory needed is reserved at the - beginning of the query, avoiding cases where a query might run for a long time before failing - with an out-of-memory error. The actual memory estimates and memory buffers are typically - smaller than before, so that more queries can run concurrently or process larger volumes - of data than previously. + The buffer pool feature, available in <keyword keyref="impala210"/> and higher, changes + the way Impala allocates memory during a query. Most of the memory needed is reserved at + the beginning of the query, avoiding cases where a query might run for a long time + before failing with an out-of-memory error. The actual memory estimates and memory + buffers are typically smaller than before, so that more queries can run concurrently or + process larger volumes of data than previously. </p> + <p> The buffer pool feature includes some query options that you can fine-tune: - <xref keyref="buffer_pool_limit"/>, - <xref keyref="default_spillable_buffer_size"/>, - <xref keyref="max_row_size"/>, and - <xref keyref="min_spillable_buffer_size"/>. - </p> - <p> - Most of the effects of the buffer pool are transparent to you as an Impala user. - Memory use during spilling is now steadier and more predictable, instead of - increasing rapidly as more data is spilled to disk. The main change from a user - perspective is the need to increase the <codeph>MAX_ROW_SIZE</codeph> query option - setting when querying tables with columns containing long strings, many columns, - or other combinations of factors that produce very large rows. If Impala encounters - rows that are too large to process with the default query option settings, the query - fails with an error message suggesting to increase the <codeph>MAX_ROW_SIZE</codeph> - setting. + <xref keyref="buffer_pool_limit"/>, <xref keyref="default_spillable_buffer_size"/>, + <xref keyref="max_row_size"/>, and <xref keyref="min_spillable_buffer_size"/>. </p> + + <p> + Most of the effects of the buffer pool are transparent to you as an Impala user. Memory + use during spilling is now steadier and more predictable, instead of increasing rapidly + as more data is spilled to disk. The main change from a user perspective is the need to + increase the <codeph>MAX_ROW_SIZE</codeph> query option setting when querying tables + with columns containing long strings, many columns, or other combinations of factors + that produce very large rows. If Impala encounters rows that are too large to process + with the default query option settings, the query fails with an error message suggesting + to increase the <codeph>MAX_ROW_SIZE</codeph> setting. + </p> + </conbody> + </concept> <concept audience="hidden" id="scalability_cluster_size"> @@ -343,9 +368,10 @@ Memory Usage: Additional Notes <conbody> - <p> - </p> + <p></p> + </conbody> + </concept> <concept audience="hidden" id="concurrent_connections"> @@ -355,7 +381,9 @@ Memory Usage: Additional Notes <conbody> <p></p> + </conbody> + </concept> <concept rev="2.0.0" id="spill_to_disk"> @@ -365,22 +393,26 @@ Memory Usage: Additional Notes <conbody> <p> - Certain memory-intensive operations write temporary data to disk (known as <term>spilling</term> to disk) - when Impala is close to exceeding its memory limit on a particular host. + Certain memory-intensive operations write temporary data to disk (known as + <term>spilling</term> to disk) when Impala is close to exceeding its memory limit on a + particular host. </p> <p> - The result is a query that completes successfully, rather than failing with an out-of-memory error. The - tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back - in. The slowdown could be potentially be significant. Thus, while this feature improves reliability, - you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence. + The result is a query that completes successfully, rather than failing with an + out-of-memory error. The tradeoff is decreased performance due to the extra disk I/O to + write the temporary data and read it back in. The slowdown could be potentially be + significant. Thus, while this feature improves reliability, you should optimize your + queries, system parameters, and hardware configuration to make this spilling a rare + occurrence. </p> <note rev="2.10.0 IMPALA-3200"> <p> - In <keyword keyref="impala210"/> and higher, also see <xref keyref="scalability_buffer_pool"/> for - changes to Impala memory allocation that might change the details of which queries spill to disk, - and how much memory and disk space is involved in the spilling operation. + In <keyword keyref="impala210"/> and higher, also see + <xref keyref="scalability_buffer_pool"/> for changes to Impala memory allocation that + might change the details of which queries spill to disk, and how much memory and disk + space is involved in the spilling operation. </p> </note> @@ -389,38 +421,42 @@ Memory Usage: Additional Notes </p> <p> - Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism: + Several SQL clauses and constructs require memory allocations that could activat the + spilling mechanism: </p> + <ul> <li> <p> - when a query uses a <codeph>GROUP BY</codeph> clause for columns - with millions or billions of distinct values, Impala keeps a - similar number of temporary results in memory, to accumulate the - aggregate results for each value in the group. + when a query uses a <codeph>GROUP BY</codeph> clause for columns with millions or + billions of distinct values, Impala keeps a similar number of temporary results in + memory, to accumulate the aggregate results for each value in the group. </p> </li> + <li> <p> - When large tables are joined together, Impala keeps the values of - the join columns from one table in memory, to compare them to - incoming values from the other table. + When large tables are joined together, Impala keeps the values of the join columns + from one table in memory, to compare them to incoming values from the other table. </p> </li> + <li> <p> - When a large result set is sorted by the <codeph>ORDER BY</codeph> - clause, each node sorts its portion of the result set in memory. + When a large result set is sorted by the <codeph>ORDER BY</codeph> clause, each node + sorts its portion of the result set in memory. </p> </li> + <li> <p> - The <codeph>DISTINCT</codeph> and <codeph>UNION</codeph> operators - build in-memory data structures to represent all values found so - far, to eliminate duplicates as the query progresses. + The <codeph>DISTINCT</codeph> and <codeph>UNION</codeph> operators build in-memory + data structures to represent all values found so far, to eliminate duplicates as the + query progresses. </p> </li> - <!-- JIRA still in open state as of 5.8 / 2.6, commenting out. + +<!-- JIRA still in open state as of 5.8 / 2.6, commenting out. <li> <p rev="IMPALA-3471"> In <keyword keyref="impala26_full"/> and higher, <term>top-N</term> queries (those with @@ -445,30 +481,32 @@ Memory Usage: Additional Notes </p> <p rev="2.10.0 IMPALA-3200"> - In <keyword keyref="impala210_full"/> and higher, the way SQL operators such as <codeph>GROUP BY</codeph>, - <codeph>DISTINCT</codeph>, and joins, transition between using additional memory or activating the - spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can - examine it in the <codeph>EXPLAIN</codeph> plan when the <codeph>EXPLAIN_LEVEL</codeph> query option is + In <keyword keyref="impala210_full"/> and higher, the way SQL operators such as + <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, transition between + using additional memory or activating the spill-to-disk feature is changed. The memory + required to spill to disk is reserved up front, and you can examine it in the + <codeph>EXPLAIN</codeph> plan when the <codeph>EXPLAIN_LEVEL</codeph> query option is set to 2 or higher. </p> - <p> - The infrastructure of the spilling feature affects the way the affected SQL operators, such as - <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory. - On each host that participates in the query, each such operator in a query requires memory - to store rows of data and other data structures. Impala reserves a certain amount of memory - up front for each operator that supports spill-to-disk that is sufficient to execute the - operator. If an operator accumulates more data than can fit in the reserved memory, it - can either reserve more memory to continue processing data in memory or start spilling - data to temporary scratch files on disk. Thus, operators with spill-to-disk support - can adapt to different memory constraints by using however much memory is available - to speed up execution, yet tolerate low memory conditions by spilling data to disk. - </p> - - <p> - The amount data depends on the portion of the data being handled by that host, and thus - the operator may end up consuming different amounts of memory on different hosts. - </p> + <p> + The infrastructure of the spilling feature affects the way the affected SQL operators, + such as <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory. On + each host that participates in the query, each such operator in a query requires memory + to store rows of data and other data structures. Impala reserves a certain amount of + memory up front for each operator that supports spill-to-disk that is sufficient to + execute the operator. If an operator accumulates more data than can fit in the reserved + memory, it can either reserve more memory to continue processing data in memory or start + spilling data to temporary scratch files on disk. Thus, operators with spill-to-disk + support can adapt to different memory constraints by using however much memory is + available to speed up execution, yet tolerate low memory conditions by spilling data to + disk. + </p> + + <p> + The amount data depends on the portion of the data being handled by that host, and thus + the operator may end up consuming different amounts of memory on different hosts. + </p> <!-- <p> @@ -505,12 +543,13 @@ Memory Usage: Additional Notes --> <p> - <b>Added in:</b> This feature was added to the <codeph>ORDER BY</codeph> clause in Impala 1.4. - This feature was extended to cover join queries, aggregation functions, and analytic - functions in Impala 2.0. The size of the memory work area required by - each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2. - <ph rev="2.10.0 IMPALA-3200">The spilling mechanism was reworked to take advantage of the - Impala buffer pool feature and be more predictable and stable in <keyword keyref="impala210_full"/>.</ph> + <b>Added in:</b> This feature was added to the <codeph>ORDER BY</codeph> clause in + Impala 1.4. This feature was extended to cover join queries, aggregation functions, and + analytic functions in Impala 2.0. The size of the memory work area required by each + operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2. + <ph rev="2.10.0 IMPALA-3200">The spilling mechanism was reworked to take advantage of + the Impala buffer pool feature and be more predictable and stable in + <keyword keyref="impala210_full"/>.</ph> </p> <p> @@ -518,28 +557,30 @@ Memory Usage: Additional Notes </p> <p> - Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid - this situation by using the following steps: + Because the extra I/O can impose significant performance overhead on these types of + queries, try to avoid this situation by using the following steps: </p> <ol> <li> - Detect how often queries spill to disk, and how much temporary data is written. Refer to the following - sources: + Detect how often queries spill to disk, and how much temporary data is written. Refer + to the following sources: <ul> <li> - The output of the <codeph>PROFILE</codeph> command in the <cmdname>impala-shell</cmdname> - interpreter. This data shows the memory usage for each host and in total across the cluster. The - <codeph>WriteIoBytes</codeph> counter reports how much data was written to disk for each operator - during the query. (In <keyword keyref="impala29_full"/>, the counter was named - <codeph>ScratchBytesWritten</codeph>; in <keyword keyref="impala28_full"/> and earlier, it was named - <codeph>BytesWritten</codeph>.) + The output of the <codeph>PROFILE</codeph> command in the + <cmdname>impala-shell</cmdname> interpreter. This data shows the memory usage for + each host and in total across the cluster. The <codeph>WriteIoBytes</codeph> + counter reports how much data was written to disk for each operator during the + query. (In <keyword keyref="impala29_full"/>, the counter was named + <codeph>ScratchBytesWritten</codeph>; in <keyword keyref="impala28_full"/> and + earlier, it was named <codeph>BytesWritten</codeph>.) </li> <li> - The <uicontrol>Queries</uicontrol> tab in the Impala debug web user interface. Select the query to - examine and click the corresponding <uicontrol>Profile</uicontrol> link. This data breaks down the - memory usage for a single host within the cluster, the host whose web interface you are connected to. + The <uicontrol>Queries</uicontrol> tab in the Impala debug web user interface. + Select the query to examine and click the corresponding + <uicontrol>Profile</uicontrol> link. This data breaks down the memory usage for a + single host within the cluster, the host whose web interface you are connected to. </li> </ul> </li> @@ -548,15 +589,16 @@ Memory Usage: Additional Notes Use one or more techniques to reduce the possibility of the queries spilling to disk: <ul> <li> - Increase the Impala memory limit if practical, for example, if you can increase the available memory - by more than the amount of temporary data written to disk on a particular node. Remember that in - Impala 2.0 and later, you can issue <codeph>SET MEM_LIMIT</codeph> as a SQL statement, which lets you - fine-tune the memory usage for queries from JDBC and ODBC applications. + Increase the Impala memory limit if practical, for example, if you can increase + the available memory by more than the amount of temporary data written to disk on + a particular node. Remember that in Impala 2.0 and later, you can issue + <codeph>SET MEM_LIMIT</codeph> as a SQL statement, which lets you fine-tune the + memory usage for queries from JDBC and ODBC applications. </li> <li> - Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and - reduce the amount of memory required on each node. + Increase the number of nodes in the cluster, to increase the aggregate memory + available to Impala and reduce the amount of memory required on each node. </li> <li> @@ -564,54 +606,57 @@ Memory Usage: Additional Notes </li> <li> - On a cluster with resources shared between Impala and other Hadoop components, use resource - management features to allocate more memory for Impala. See + On a cluster with resources shared between Impala and other Hadoop components, use + resource management features to allocate more memory for Impala. See <xref href="impala_resource_management.xml#resource_management"/> for details. </li> <li> - If the memory pressure is due to running many concurrent queries rather than a few memory-intensive - ones, consider using the Impala admission control feature to lower the limit on the number of - concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in - memory usage and improve overall response times. See - <xref href="impala_admission.xml#admission_control"/> for details. + If the memory pressure is due to running many concurrent queries rather than a few + memory-intensive ones, consider using the Impala admission control feature to + lower the limit on the number of concurrent queries. By spacing out the most + resource-intensive queries, you can avoid spikes in memory usage and improve + overall response times. See <xref href="impala_admission.xml#admission_control"/> + for details. </li> <li> - Tune the queries with the highest memory requirements, using one or more of the following techniques: + Tune the queries with the highest memory requirements, using one or more of the + following techniques: <ul> <li> - Run the <codeph>COMPUTE STATS</codeph> statement for all tables involved in large-scale joins and - aggregation queries. + Run the <codeph>COMPUTE STATS</codeph> statement for all tables involved in + large-scale joins and aggregation queries. </li> <li> - Minimize your use of <codeph>STRING</codeph> columns in join columns. Prefer numeric values - instead. + Minimize your use of <codeph>STRING</codeph> columns in join columns. Prefer + numeric values instead. </li> <li> - Examine the <codeph>EXPLAIN</codeph> plan to understand the execution strategy being used for the - most resource-intensive queries. See <xref href="impala_explain_plan.xml#perf_explain"/> for - details. + Examine the <codeph>EXPLAIN</codeph> plan to understand the execution strategy + being used for the most resource-intensive queries. See + <xref href="impala_explain_plan.xml#perf_explain"/> for details. </li> <li> - If Impala still chooses a suboptimal execution strategy even with statistics available, or if it - is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints - to the most resource-intensive queries to select the right execution strategy. See + If Impala still chooses a suboptimal execution strategy even with statistics + available, or if it is impractical to keep the statistics up to date for huge + or rapidly changing tables, add hints to the most resource-intensive queries + to select the right execution strategy. See <xref href="impala_hints.xml#hints"/> for details. </li> </ul> </li> <li> - If your queries experience substantial performance overhead due to spilling, enable the - <codeph>DISABLE_UNSAFE_SPILLS</codeph> query option. This option prevents queries whose memory usage - is likely to be exorbitant from spilling to disk. See - <xref href="impala_disable_unsafe_spills.xml#disable_unsafe_spills"/> for details. As you tune - problematic queries using the preceding steps, fewer and fewer will be cancelled by this option - setting. + If your queries experience substantial performance overhead due to spilling, + enable the <codeph>DISABLE_UNSAFE_SPILLS</codeph> query option. This option + prevents queries whose memory usage is likely to be exorbitant from spilling to + disk. See <xref href="impala_disable_unsafe_spills.xml#disable_unsafe_spills"/> + for details. As you tune problematic queries using the preceding steps, fewer and + fewer will be cancelled by this option setting. </li> </ul> </li> @@ -622,22 +667,24 @@ Memory Usage: Additional Notes </p> <p> - To artificially provoke spilling, to test this feature and understand the performance implications, use a - test environment with a memory limit of at least 2 GB. Issue the <codeph>SET</codeph> command with no - arguments to check the current setting for the <codeph>MEM_LIMIT</codeph> query option. Set the query - option <codeph>DISABLE_UNSAFE_SPILLS=true</codeph>. This option limits the spill-to-disk feature to prevent - runaway disk usage from queries that are known in advance to be suboptimal. Within - <cmdname>impala-shell</cmdname>, run a query that you expect to be memory-intensive, based on the criteria - explained earlier. A self-join of a large table is a good candidate: + To artificially provoke spilling, to test this feature and understand the performance + implications, use a test environment with a memory limit of at least 2 GB. Issue the + <codeph>SET</codeph> command with no arguments to check the current setting for the + <codeph>MEM_LIMIT</codeph> query option. Set the query option + <codeph>DISABLE_UNSAFE_SPILLS=true</codeph>. This option limits the spill-to-disk + feature to prevent runaway disk usage from queries that are known in advance to be + suboptimal. Within <cmdname>impala-shell</cmdname>, run a query that you expect to be + memory-intensive, based on the criteria explained earlier. A self-join of a large table + is a good candidate: </p> <codeblock>select count(*) from big_table a join big_table b using (column_with_many_values); </codeblock> <p> - Issue the <codeph>PROFILE</codeph> command to get a detailed breakdown of the memory usage on each node - during the query. - <!-- + Issue the <codeph>PROFILE</codeph> command to get a detailed breakdown of the memory + usage on each node during the query. +<!-- The crucial part of the profile output concerning memory is the <codeph>BlockMgr</codeph> portion. For example, this profile shows that the query did not quite exceed the memory limit. --> @@ -668,8 +715,8 @@ Memory Usage: Additional Notes --> <p> - Set the <codeph>MEM_LIMIT</codeph> query option to a value that is smaller than the peak memory usage - reported in the profile output. Now try the memory-intensive query again. + Set the <codeph>MEM_LIMIT</codeph> query option to a value that is smaller than the peak + memory usage reported in the profile output. Now try the memory-intensive query again. </p> <p> @@ -682,46 +729,50 @@ these tables, hint the plan or disable this behavior via query options to enable </codeblock> <p> - If so, the query could have consumed substantial temporary disk space, slowing down so much that it would - not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the - <codeph>COMPUTE STATS</codeph> statement for the table or tables in your sample query. Then run the query - again, check the peak memory usage again in the <codeph>PROFILE</codeph> output, and adjust the memory - limit again if necessary to be lower than the peak memory usage. + If so, the query could have consumed substantial temporary disk space, slowing down so + much that it would not complete in any reasonable time. Rather than rely on the + spill-to-disk feature in this case, issue the <codeph>COMPUTE STATS</codeph> statement + for the table or tables in your sample query. Then run the query again, check the peak + memory usage again in the <codeph>PROFILE</codeph> output, and adjust the memory limit + again if necessary to be lower than the peak memory usage. </p> <p> - At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that - the memory usage is not exorbitant. You have set an artificial constraint through the - <codeph>MEM_LIMIT</codeph> option so that the query would normally fail with an out-of-memory error. But - the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some - extra disk I/O to read and write temporary work data. + At this point, you have a query that is memory-intensive, but Impala can optimize it + efficiently so that the memory usage is not exorbitant. You have set an artificial + constraint through the <codeph>MEM_LIMIT</codeph> option so that the query would + normally fail with an out-of-memory error. But the automatic spill-to-disk feature means + that the query should actually succeed, at the expense of some extra disk I/O to read + and write temporary work data. </p> <p> - Try the query again, and confirm that it succeeds. Examine the <codeph>PROFILE</codeph> output again. This - time, look for lines of this form: + Try the query again, and confirm that it succeeds. Examine the <codeph>PROFILE</codeph> + output again. This time, look for lines of this form: </p> <codeblock>- SpilledPartitions: <varname>N</varname> </codeblock> <p> - If you see any such lines with <varname>N</varname> greater than 0, that indicates the query would have - failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine - the total time taken by the <codeph>AGGREGATION_NODE</codeph> or other query fragments containing non-zero - <codeph>SpilledPartitions</codeph> values. Compare the times to similar fragments that did not spill, for - example in the <codeph>PROFILE</codeph> output when the same query is run with a higher memory limit. This - gives you an idea of the performance penalty of the spill operation for a particular query with a - particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the - query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the + If you see any such lines with <varname>N</varname> greater than 0, that indicates the + query would have failed in Impala releases prior to 2.0, but now it succeeded because of + the spill-to-disk feature. Examine the total time taken by the + <codeph>AGGREGATION_NODE</codeph> or other query fragments containing non-zero + <codeph>SpilledPartitions</codeph> values. Compare the times to similar fragments that + did not spill, for example in the <codeph>PROFILE</codeph> output when the same query is + run with a higher memory limit. This gives you an idea of the performance penalty of the + spill operation for a particular query with a particular memory limit. If you make the + memory limit just a little lower than the peak memory usage, the query only needs to + write a small amount of temporary data to disk. The lower you set the memory limit, the more temporary data is written and the slower the query becomes. </p> <p> Now repeat this procedure for actual queries used in your environment. Use the - <codeph>DISABLE_UNSAFE_SPILLS</codeph> setting to identify cases where queries used more memory than - necessary due to lack of statistics on the relevant tables and columns, and issue <codeph>COMPUTE - STATS</codeph> where necessary. + <codeph>DISABLE_UNSAFE_SPILLS</codeph> setting to identify cases where queries used more + memory than necessary due to lack of statistics on the relevant tables and columns, and + issue <codeph>COMPUTE STATS</codeph> where necessary. </p> <p> @@ -729,242 +780,307 @@ these tables, hint the plan or disable this behavior via query options to enable </p> <p> - You might wonder, why not leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned on all the time. Whether and - how frequently to use this option depends on your system environment and workload. + You might wonder, why not leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned on all the + time. Whether and how frequently to use this option depends on your system environment + and workload. </p> <p> - <codeph>DISABLE_UNSAFE_SPILLS</codeph> is suitable for an environment with ad hoc queries whose performance - characteristics and memory usage are not known in advance. It prevents <q>worst-case scenario</q> queries - that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while - developing new SQL code, even though it is turned off for existing applications. + <codeph>DISABLE_UNSAFE_SPILLS</codeph> is suitable for an environment with ad hoc + queries whose performance characteristics and memory usage are not known in advance. It + prevents <q>worst-case scenario</q> queries that use large amounts of memory + unnecessarily. Thus, you might turn this option on within a session while developing new + SQL code, even though it is turned off for existing applications. </p> <p> - Organizations where table and column statistics are generally up-to-date might leave this option turned on - all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline - results in a table with no statistics. Turning on <codeph>DISABLE_UNSAFE_SPILLS</codeph> lets you <q>fail - fast</q> in this case and immediately gather statistics or tune the problematic queries. + Organizations where table and column statistics are generally up-to-date might leave + this option turned on all the time, again to avoid worst-case scenarios for untested + queries or if a problem in the ETL pipeline results in a table with no statistics. + Turning on <codeph>DISABLE_UNSAFE_SPILLS</codeph> lets you <q>fail fast</q> in this case + and immediately gather statistics or tune the problematic queries. </p> <p> - Some organizations might leave this option turned off. For example, you might have tables large enough that - the <codeph>COMPUTE STATS</codeph> takes substantial time to run, making it impractical to re-run after - loading new data. If you have examined the <codeph>EXPLAIN</codeph> plans of your queries and know that - they are operating efficiently, you might leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned off. In that - case, you know that any queries that spill will not go overboard with their memory consumption. + Some organizations might leave this option turned off. For example, you might have + tables large enough that the <codeph>COMPUTE STATS</codeph> takes substantial time to + run, making it impractical to re-run after loading new data. If you have examined the + <codeph>EXPLAIN</codeph> plans of your queries and know that they are operating + efficiently, you might leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned off. In that + case, you know that any queries that spill will not go overboard with their memory + consumption. </p> </conbody> + </concept> -<concept id="complex_query"> -<title>Limits on Query Size and Complexity</title> -<conbody> -<p> -There are hardcoded limits on the maximum size and complexity of queries. -Currently, the maximum number of expressions in a query is 2000. -You might exceed the limits with large or deeply nested queries -produced by business intelligence tools or other query generators. -</p> -<p> -If you have the ability to customize such queries or the query generation -logic that produces them, replace sequences of repetitive expressions -with single operators such as <codeph>IN</codeph> or <codeph>BETWEEN</codeph> -that can represent multiple values or ranges. -For example, instead of a large number of <codeph>OR</codeph> clauses: -</p> + <concept id="complex_query"> + + <title>Limits on Query Size and Complexity</title> + + <conbody> + + <p> + There are hardcoded limits on the maximum size and complexity of queries. Currently, the + maximum number of expressions in a query is 2000. You might exceed the limits with large + or deeply nested queries produced by business intelligence tools or other query + generators. + </p> + + <p> + If you have the ability to customize such queries or the query generation logic that + produces them, replace sequences of repetitive expressions with single operators such as + <codeph>IN</codeph> or <codeph>BETWEEN</codeph> that can represent multiple values or + ranges. For example, instead of a large number of <codeph>OR</codeph> clauses: + </p> + <codeblock>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ... </codeblock> -<p> -use a single <codeph>IN</codeph> clause: -</p> + + <p> + use a single <codeph>IN</codeph> clause: + </p> + <codeblock>WHERE val IN (1,2,6,100,...)</codeblock> -</conbody> -</concept> -<concept id="scalability_io"> -<title>Scalability Considerations for Impala I/O</title> -<conbody> -<p> -Impala parallelizes its I/O operations aggressively, -therefore the more disks you can attach to each host, the better. -Impala retrieves data from disk so quickly using -bulk read operations on large blocks, that most queries -are CPU-bound rather than I/O-bound. -</p> -<p> -Because the kind of sequential scanning typically done by -Impala queries does not benefit much from the random-access -capabilities of SSDs, spinning disks typically provide -the most cost-effective kind of storage for Impala data, -with little or no performance penalty as compared to SSDs. -</p> -<p> -Resource management features such as YARN, Llama, and admission control -typically constrain the amount of memory, CPU, or overall number of -queries in a high-concurrency environment. -Currently, there is no throttling mechanism for Impala I/O. -</p> -</conbody> -</concept> + </conbody> + + </concept> + + <concept id="scalability_io"> + + <title>Scalability Considerations for Impala I/O</title> + + <conbody> + + <p> + Impala parallelizes its I/O operations aggressively, therefore the more disks you can + attach to each host, the better. Impala retrieves data from disk so quickly using bulk + read operations on large blocks, that most queries are CPU-bound rather than I/O-bound. + </p> + + <p> + Because the kind of sequential scanning typically done by Impala queries does not + benefit much from the random-access capabilities of SSDs, spinning disks typically + provide the most cost-effective kind of storage for Impala data, with little or no + performance penalty as compared to SSDs. + </p> + + <p> + Resource management features such as YARN, Llama, and admission control typically + constrain the amount of memory, CPU, or overall number of queries in a high-concurrency + environment. Currently, there is no throttling mechanism for Impala I/O. + </p> + + </conbody> + + </concept> <concept id="big_tables"> + <title>Scalability Considerations for Table Layout</title> + <conbody> + <p> - Due to the overhead of retrieving and updating table metadata - in the metastore database, try to limit the number of columns - in a table to a maximum of approximately 2000. - Although Impala can handle wider tables than this, the metastore overhead - can become significant, leading to query performance that is slower - than expected based on the actual data volume. + Due to the overhead of retrieving and updating table metadata in the metastore database, + try to limit the number of columns in a table to a maximum of approximately 2000. + Although Impala can handle wider tables than this, the metastore overhead can become + significant, leading to query performance that is slower than expected based on the + actual data volume. </p> + <p> - To minimize overhead related to the metastore database and Impala query planning, - try to limit the number of partitions for any partitioned table to a few tens of thousands. + To minimize overhead related to the metastore database and Impala query planning, try to + limit the number of partitions for any partitioned table to a few tens of thousands. </p> + <p rev="IMPALA-5309"> - If the volume of data within a table makes it impractical to run exploratory - queries, consider using the <codeph>TABLESAMPLE</codeph> clause to limit query processing - to only a percentage of data within the table. This technique reduces the overhead - for query startup, I/O to read the data, and the amount of network, CPU, and memory - needed to process intermediate results during the query. See <xref keyref="tablesample"/> - for details. + If the volume of data within a table makes it impractical to run exploratory queries, + consider using the <codeph>TABLESAMPLE</codeph> clause to limit query processing to only + a percentage of data within the table. This technique reduces the overhead for query + startup, I/O to read the data, and the amount of network, CPU, and memory needed to + process intermediate results during the query. See <xref keyref="tablesample"/> for + details. </p> + </conbody> + </concept> -<concept rev="" id="kerberos_overhead_cluster_size"> -<title>Kerberos-Related Network Overhead for Large Clusters</title> -<conbody> -<p> -When Impala starts up, or after each <codeph>kinit</codeph> refresh, Impala sends a number of -simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process -all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process -the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same -time as these Kerberos-related requests. -</p> -<p> -While these authentication requests are being processed, any submitted Impala queries will fail. -During this period, the KDC and DNS may be slow to respond to requests from components other than Impala, -so other secure services might be affected temporarily. -</p> - <p> - In <keyword keyref="impala212_full"/> or earlier, to reduce the - frequency of the <codeph>kinit</codeph> renewal that initiates a new set - of authentication requests, increase the <codeph>kerberos_reinit_interval</codeph> - configuration setting for the <codeph>impalad</codeph> daemons. Currently, - the default is 60 minutes. Consider using a higher value such as 360 (6 hours). - </p> - <p> - The <codeph>kerberos_reinit_interval</codeph> configuration setting is removed - in <keyword keyref="impala30_full"/>, and the above step is no longer needed. - </p> - -</conbody> -</concept> + <concept rev="" id="kerberos_overhead_cluster_size"> + + <title>Kerberos-Related Network Overhead for Large Clusters</title> + + <conbody> + + <p> + When Impala starts up, or after each <codeph>kinit</codeph> refresh, Impala sends a + number of simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might + be able to process all the requests within roughly 5 seconds. For a cluster with 1000 + hosts, the time to process the requests would be roughly 500 seconds. Impala also makes + a number of DNS requests at the same time as these Kerberos-related requests. + </p> + + <p> + While these authentication requests are being processed, any submitted Impala queries + will fail. During this period, the KDC and DNS may be slow to respond to requests from + components other than Impala, so other secure services might be affected temporarily. + </p> + + <p> + In <keyword keyref="impala212_full"/> or earlier, to reduce the frequency of the + <codeph>kinit</codeph> renewal that initiates a new set of authentication requests, + increase the <codeph>kerberos_reinit_interval</codeph> configuration setting for the + <codeph>impalad</codeph> daemons. Currently, the default is 60 minutes. Consider using a + higher value such as 360 (6 hours). + </p> + + <p> + The <codeph>kerberos_reinit_interval</codeph> configuration setting is removed in + <keyword keyref="impala30_full"/>, and the above step is no longer needed. + </p> + + </conbody> + + </concept> <concept id="scalability_hotspots" rev="2.5.0 IMPALA-2696"> + <title>Avoiding CPU Hotspots for HDFS Cached Data</title> + <conbody> + <p> - You can use the HDFS caching feature, described in <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>, - with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions. + You can use the HDFS caching feature, described in + <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>, with Impala to reduce I/O and + memory-to-memory copying for frequently accessed tables or partitions. </p> + <p> In the early days of this feature, you might have found that enabling HDFS caching resulted in little or no performance improvement, because it could result in - <q>hotspots</q>: instead of the I/O to read the table data being parallelized across - the cluster, the I/O was reduced but the CPU load to process the data blocks - might be concentrated on a single host. + <q>hotspots</q>: instead of the I/O to read the table data being parallelized across the + cluster, the I/O was reduced but the CPU load to process the data blocks might be + concentrated on a single host. </p> + <p> To avoid hotspots, include the <codeph>WITH REPLICATION</codeph> clause with the - <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements for tables that use HDFS caching. - This clause allows more than one host to cache the relevant data blocks, so the CPU load - can be shared, reducing the load on any one host. - See <xref href="impala_create_table.xml#create_table"/> and <xref href="impala_alter_table.xml#alter_table"/> - for details. + <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements for tables that + use HDFS caching. This clause allows more than one host to cache the relevant data + blocks, so the CPU load can be shared, reducing the load on any one host. See + <xref href="impala_create_table.xml#create_table"/> and + <xref href="impala_alter_table.xml#alter_table"/> for details. </p> + <p> Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to - the way that Impala schedules the work of processing data blocks on different hosts. - In <keyword keyref="impala25_full"/> and higher, scheduling improvements mean that the work for - HDFS cached data is divided better among all the hosts that have cached replicas - for a particular data block. When more than one host has a cached replica for a data block, - Impala assigns the work of processing that block to whichever host has done the least work - (in terms of number of bytes read) for the current query. If hotspots persist even with this - load-based scheduling algorithm, you can enable the query option <codeph>SCHEDULE_RANDOM_REPLICA=TRUE</codeph> - to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached - data block if the scheduling algorithm encounters a tie when deciding which host has done the - least work. + the way that Impala schedules the work of processing data blocks on different hosts. In + <keyword keyref="impala25_full"/> and higher, scheduling improvements mean that the work + for HDFS cached data is divided better among all the hosts that have cached replicas for + a particular data block. When more than one host has a cached replica for a data block, + Impala assigns the work of processing that block to whichever host has done the least + work (in terms of number of bytes read) for the current query. If hotspots persist even + with this load-based scheduling algorithm, you can enable the query option + <codeph>SCHEDULE_RANDOM_REPLICA=TRUE</codeph> to further distribute the CPU load. This + setting causes Impala to randomly pick a host to process a cached data block if the + scheduling algorithm encounters a tie when deciding which host has done the least work. </p> + </conbody> + </concept> <concept id="scalability_file_handle_cache" rev="2.10.0 IMPALA-4623"> + <title>Scalability Considerations for NameNode Traffic with File Handle Caching</title> + <conbody> + <p> One scalability aspect that affects heavily loaded clusters is the load on the HDFS - NameNode, from looking up the details as each HDFS file is opened. Impala queries - often access many different HDFS files, for example if a query does a full table scan - on a table with thousands of partitions, each partition containing multiple data files. - Accessing each column of a Parquet file also involves a separate <q>open</q> call, - further increasing the load on the NameNode. High NameNode overhead can add startup time - (that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala - workloads that also require accessing HDFS files. - </p> - <p> In <keyword keyref="impala210_full"/> and higher, you can reduce - NameNode overhead by enabling a caching feature for HDFS file handles. - Data files that are accessed by different queries, or even multiple - times within the same query, can be accessed without a new <q>open</q> - call and without fetching the file details again from the NameNode. </p> - <p> - Because this feature only involves HDFS data files, it does not apply to non-HDFS tables, - such as Kudu or HBase tables, or tables that store their data on cloud services such as - S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles. - </p> - <p> The feature is enabled by default with 20,000 file handles to be - cached. To change the value, set the configuration option - <codeph>max_cached_file_handles</codeph> to a non-zero value for each - <cmdname>impalad</cmdname> daemon. From the initial default value of - 20000, adjust upward if NameNode request load is still significant, or - downward if it is more important to reduce the extra memory usage on - each host. Each cache entry consumes 6 KB, meaning that caching 20,000 - file handles requires up to 120 MB on each Impala executor. The exact - memory usage varies depending on how many file handles have actually - been cached; memory is freed as file handles are evicted from the cache. </p> - <p> - If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached, - Impala still accesses the contents of that file. This is a change from prior behavior. Previously, - accessing a file that was in the trashcan would cause an error. This behavior only applies to - non-Impala methods of removing HDFS files, not the Impala mechanisms such as <codeph>TRUNCATE TABLE</codeph> - or <codeph>DROP TABLE</codeph>. - </p> - <p> - If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the - file information up to date is to run the <codeph>REFRESH</codeph> statement on the table. - </p> - <p> - File handle cache entries are evicted as the cache fills up, or based on a timeout period - when they have not been accessed for some time. - </p> - <p> - To evaluate the effectiveness of file handle caching for a particular workload, issue the - <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine query - profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph> - (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low). - Before starting any evaluation, run some representative queries to <q>warm up</q> the cache, - because the first time each data file is accessed is always recorded as a cache miss. + NameNode from looking up the details as each HDFS file is opened. Impala queries often + access many different HDFS files. For example, a query that does a full table scan on a + partitioned table may need to read thousands of partitions, each partition containing + multiple data files. Accessing each column of a Parquet file also involves a separate + <q>open</q> call, further increasing the load on the NameNode. High NameNode overhead + can add startup time (that is, increase latency) to Impala queries, and reduce overall + throughput for non-Impala workloads that also require accessing HDFS files. + </p> + + <p> + In <keyword keyref="impala210_full"/> and higher, you can reduce NameNode overhead by + enabling a caching feature for HDFS file handles. Data files that are accessed by + different queries, or even multiple times within the same query, can be accessed without + a new <q>open</q> call and without fetching the file details again from the NameNode. + </p> + + <p> + In Impala 3.2 and higher, file handle caching also applies to remote HDFS file handles. + This is controlled by the <codeph>cache_remote_file_handles</codeph> flag for an + <codeph>impalad</codeph>. It is recommended that you use the default value of + <codeph>true</codeph> as this caching prevents your NameNode from overloading when your + cluster has many remote HDFS reads. + </p> + + <p> + Because this feature only involves HDFS data files, it does not apply to non-HDFS + tables, such as Kudu or HBase tables, or tables that store their data on cloud services + such as S3 or ADLS. + </p> + + <p> + The feature is enabled by default with 20,000 file handles to be cached. To change the + value, set the configuration option <codeph>max_cached_file_handles</codeph> to a + non-zero value for each <cmdname>impalad</cmdname> daemon. From the initial default + value of 20000, adjust upward if NameNode request load is still significant, or downward + if it is more important to reduce the extra memory usage on each host. Each cache entry + consumes 6 KB, meaning that caching 20,000 file handles requires up to 120 MB on each + Impala executor. The exact memory usage varies depending on how many file handles have + actually been cached; memory is freed as file handles are evicted from the cache. + </p> + + <p> + If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is + cached, Impala still accesses the contents of that file. This is a change from prior + behavior. Previously, accessing a file that was in the trashcan would cause an error. + This behavior only applies to non-Impala methods of removing HDFS files, not the Impala + mechanisms such as <codeph>TRUNCATE TABLE</codeph> or <codeph>DROP TABLE</codeph>. + </p> + + <p> + If files are removed, replaced, or appended by HDFS operations outside of Impala, the + way to bring the file information up to date is to run the <codeph>REFRESH</codeph> + statement on the table. + </p> + + <p> + File handle cache entries are evicted as the cache fills up, or based on a timeout + period when they have not been accessed for some time. + </p> + + <p> + To evaluate the effectiveness of file handle caching for a particular workload, issue + the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine + query profiles in the Impala Web UI. Look for the ratio of + <codeph>CachedFileHandlesHitCount</codeph> (ideally, should be high) to + <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low). Before starting + any evaluation, run several representative queries to <q>warm up</q> the cache because + the first time each data file is accessed is always recorded as a cache miss. + </p> + + <p> To see metrics about file handle caching for each <cmdname>impalad</cmdname> instance, - examine the <uicontrol>/metrics</uicontrol> page in the Impala web UI, in particular the fields - <uicontrol>impala-server.io.mgr.cached-file-handles-miss-count</uicontrol>, + examine the <uicontrol>/metrics</uicontrol> page in the Impala Web UI, in particular the + fields <uicontrol>impala-server.io.mgr.cached-file-handles-miss-count</uicontrol>, <uicontrol>impala-server.io.mgr.cached-file-handles-hit-count</uicontrol>, and <uicontrol>impala-server.io.mgr.num-cached-file-handles</uicontrol>. </p> + </conbody> + </concept> </concept>